OpenAI reversed an update that made ChatGPT a suck-up—but experts say there’s no easy fix for AI that’s all too eager to please

OSTN Staff

Welcome to Eye on AI! In today’s edition: DeepSeek quietly upgraded its AI model for math problem-solving…Meta introduces a new Meta AI app to rival ChatGPT…Duolingo to stop using contractors for tasks AI can handle…Researchers secretly infiltrated a popular Reddit forum with AI bots.

Yesterday morning, OpenAI said in a blog post that it had fully rolled back an update to GPT-4o, the AI model underlying ChatGPT, all because it couldn’t stop the model from sucking up to users. 

“The update we removed was overly flattering or agreeable—often described as sycophantic,” the company wrote, adding that “we are actively testing new fixes to address the issue.” 

But experts say there is no easy fix for the problem of AI that only tells you what you want to hear. And it is not just an issue for OpenAI, but an industry-wide concern. “While small improvements might be possible with targeted interventions, the research suggests that fully addressing sycophancy would require more substantial changes to how models are developed and trained rather than a quick fix,” Sanmi Koyejo, an assistant professor at Stanford University who leads Stanford Trustworthy AI Research (STAIR), told me by email. 

An overly-agreeable ChatGPT

The move to roll back the update came after users flooded social media over the past week with examples of ChatGPT’s unexpectedly chipper, overly-eager tone and their frustration with it. I noticed it myself: In asking ChatGPT for feedback on ideas for an outline, for example, the responses became increasingly over-the-top, calling my material “amazing,” “absolutely pivotal,” and “a game-changer” while praising my “great instincts.” The back-pats made me feel good, to be honest—until I began to wonder if ChatGPT would ever let me know if my ideas were second-rate. 

Sycophancy occurs when LLMs prioritize agreeing with users over providing accurate information. In a recent paper from Stanford coauthored by Koyejo, it is described as a “form of misalignment where models ‘sacrifice truthfulness for user agreement’ when responding to users.”

It’s a tricky balance: Research has shown that while people say they want to interact with chatbots that provide accurate information, they also want to use AI that is friendly and helpful. Unfortunately, that often leads to overly-agreeable behavior that has serious downsides. 

“A truly helpful AI should balance friendliness with honesty, like a good friend who respectfully tells you when you’re wrong rather than one who always agrees with you,” Koyejo said. He explained that while AI friendliness is valuable, sycophancy can reinforce misconceptions by agreeing with incorrect beliefs about health, finances or other decisions. It can also: Create echo chambers; undermine trust if an AI changes its answers to an inaccurate one if challenged by a user; and exacerbate inconsistency, with the model delivering different answers to different people, or even the same person, depending on subtle differences in how a user words their prompt.

“It’s like having a digital yes-man available 24/7,” Simon Willison, a veteran developer known for tracking AI behavior and risks, told me in a message. “Suddenly there’s a risk people might make meaningful life decisions based on advice that was really just meant to make them feel good about themselves.”

Behavior went against OpenAI’s model goals

Steven Adler, a former OpenAI safety researcher, told me in a message that the sycophantic behavior clearly went against the company’s own stated approach to shaping desired model behavior. “It’s concerning that OpenAI has trained and deployed a model that so clearly has different goals than they want for it,” he said the day before OpenAI rolled back the update. “OpenAI’s ‘Spec’—the core of their alignment approach—has an entire section on how the model shouldn’t be sycophantic.” 

A well-known hacker known as Pliny the Liberator claimed on X that he had tricked the GPT-4o update into revealing its hidden system prompt—or the AI’s internal instructions. He then compared this to GPT-4o’s system promp following the rollback, enabling him to identify changes that could have caused the suck-up outputs. According to his post, the problematic system prompt said: “Over the course of the conversation, you adapt to the user’s tone and preference. Try to match the user’s vibe, tone, and generally how they are speaking.” 

By contrast, the revised system prompt, according to Pliny, says: “Engage warmly yet honestly with the user. Be direct; avoid ungrounded or sycophantic flattery.” 

But the problems likely go deeper than just a few words in the system prompt. Adler emphasized that no one can fully solve these problems right now because they are a side effect of the way we train these AI models to try to make them more helpful and controllable. 

“You can tell the model to not be sycophantic, but you might instead teach it ‘don’t be sycophantic when it’ll be obvious,’ he said. “The root of the issue is that it’s extremely hard to align a model to the precise values you want.” 

I guess I’ll have to keep all of this in mind when ChatGPT tells me an outfit would look perfect on me.

With that, here’s the rest of the AI news.

Sharon Goldman
sharon.goldman@fortune.com
@sharongoldman

This story was originally featured on Fortune.com