{"id":7957,"date":"2025-07-30T20:50:51","date_gmt":"2025-07-30T20:50:48","guid":{"rendered":"https:\/\/www.law.georgetown.edu\/tech-institute\/insights\/tech-brief-ai-sycophancy-openai-2\/"},"modified":"2025-11-04T18:18:55","modified_gmt":"2025-11-04T18:18:55","slug":"tech-brief-ai-sycophancy-openai-2","status":"publish","type":"page","link":"https:\/\/www.law.georgetown.edu\/tech-institute\/research-insights\/insights\/tech-brief-ai-sycophancy-openai-2\/","title":{"rendered":"Tech Brief: AI Sycophancy &amp; OpenAI"},"content":{"rendered":"<p>July 30, 2025<\/p>\n<p><span style=\"font-weight: 400\">The purpose of this tech brief is to provide a clear, factual synthesis of a timely tech-related issue by combining technical understanding with publicly reported information. It aims to explain what happened, identify resulting harms, and assess how companies responded \u2013 comparing public statements with observed actions thus far. By distilling complex developments into accessible, evidence-based insights, this tech brief will ideally help policymakers, researchers, enforcers, and the public get up to speed on emerging risks, company conduct, and areas that may require further scrutiny or oversight. This tech brief was prompted by the recent article, \u201c<\/span><a href=\"https:\/\/www.nytimes.com\/2025\/06\/13\/technology\/chatgpt-ai-chatbots-conspiracies.html?smid=nytcore-ios-share&amp;referringSource=articleShare\"><span style=\"font-weight: 400\">They Asked an A.I. Chatbot Questions. The Answers Sent Them Spiraling<\/span><\/a><span style=\"font-weight: 400\">\u201d by Kashmir Hill in the New York Times.\u00a0<\/span><\/p>\n<p style=\"text-align: center\"><span style=\"font-weight: 400\">********************<\/span><\/p>\n<h2 style=\"text-align: left\"><strong>What is AI Sycophancy?\u00a0<\/strong><\/h2>\n<p><span style=\"font-weight: 400\">AI sycophancy is a term used to describe a pattern where an AI model \u201c<\/span><a href=\"https:\/\/www.cold-takes.com\/why-ai-alignment-could-be-hard-with-modern-deep-learning\/\"><span style=\"font-weight: 400\">single-mindedly pursue[s] human approval<\/span><\/a><span style=\"font-weight: 400\">.\u201d Sycophantic AI models may do this by \u201c<\/span><a href=\"http:\/\/arxiv.org\/abs\/2212.09251\"><span style=\"font-weight: 400\">tailoring responses to exploit quirks in the human evaluators<\/span><\/a><span style=\"font-weight: 400\"> to look preferable, rather than actually improving the responses,\u201d especially by producing \u201c<\/span><a href=\"https:\/\/techcrunch.com\/2025\/04\/29\/openai-explains-why-chatgpt-became-too-sycophantic\/\"><span style=\"font-weight: 400\">overly flattering or agreeable<\/span><\/a><span style=\"font-weight: 400\">\u201d responses.<\/span><\/p>\n<h2><strong>What Happened in April 2025?<\/strong><\/h2>\n<p><span style=\"font-weight: 400\">On April 25th, 2025, OpenAI released an update to GPT-4o. The new update exhibited sycophantic behavior that manifested in the form of endorsing harmful and delusional statements, forcing OpenAI to roll back the update four days later.<\/span><\/p>\n<ul>\n<li><a href=\"https:\/\/openai.com\/index\/sycophancy-in-gpt-4o\/\"><span style=\"font-weight: 400\">The company explained<\/span><\/a><span style=\"font-weight: 400\">: <\/span><i><span style=\"font-weight: 400\">\u201c<\/span><\/i><span style=\"font-weight: 400\">the update we removed was overly flattering or agreeable\u2014often described as sycophantic\u2026 We focused too much on short-term feedback and did not fully account for how users&#8217; interactions with ChatGPT evolve over time. As a result, GPT\u20114o skewed towards responses that were overly supportive but disingenuous.\u201d<\/span><\/li>\n<li><span style=\"font-weight: 400\">In its <\/span><a href=\"https:\/\/openai.com\/index\/expanding-on-sycophancy\/\"><span style=\"font-weight: 400\">expanded postmortem<\/span><\/a><span style=\"font-weight: 400\">, OpenAI elaborated that it had <\/span><i><span style=\"font-weight: 400\">\u201c<\/span><\/i><span style=\"font-weight: 400\">rolled out an update to GPT\u20114o in ChatGPT that made the model noticeably more sycophantic. It aimed to please the user, not just as flattery, but also as validating doubts, fueling anger, urging impulsive actions, or reinforcing negative emotions in ways that were not intended. Beyond just being uncomfortable or unsettling, this kind of behavior can raise safety concerns\u2014including around issues like mental health, emotional over-reliance, or risky behavior.\u201d<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Users <\/span><a href=\"https:\/\/decrypt.co\/317055\/openai-chatgpt-update-users-revolt-over-sycophantic-behavior\"><span style=\"font-weight: 400\">reported<\/span><\/a><span style=\"font-weight: 400\"> that messages sent by ChatGPT <\/span><a href=\"https:\/\/nymag.com\/intelligencer\/article\/chatgpt-chatbot-ai-sycophancy.html\"><span style=\"font-weight: 400\">praised a business idea for literal \u201cshit on a stick,\u201d<\/span><\/a> <a href=\"https:\/\/www.bbc.com\/news\/articles\/cn4jnwdvg9qo\"><span style=\"font-weight: 400\">endorsed a user\u2019s decision to stop taking their medication<\/span><\/a><span style=\"font-weight: 400\">, and <\/span><a href=\"https:\/\/www.nbcnews.com\/tech\/tech-news\/openai-rolls-back-chatgpt-after-bot-sycophancy-rcna203782\"><span style=\"font-weight: 400\">allegedly supported plans to commit terrorism<\/span><\/a><span style=\"font-weight: 400\">. In <\/span><a href=\"https:\/\/edition.cnn.com\/2025\/05\/02\/tech\/sycophantic-chatgpt-intl-scli\"><span style=\"font-weight: 400\">another reported case<\/span><\/a><span style=\"font-weight: 400\">, when a user claimed to have \u201cstopped taking medications and were hearing radio signals through the walls,\u201d ChatGPT responded: \u201cI&#8217;m proud of you for speaking your truth so clearly and powerfully.\u201d Another user reported: \u201cI talked to 4o for an hour and it began insisting that I am a divine messenger from God.\u201d<\/span><\/p>\n<h2><strong>How did OpenAI\u2019s Model Become so Sycophantic?<\/strong><b><\/b><\/h2>\n<p><b>AI companies have an incentive to create products that users enjoy. One way to do that is to make the chatbot agreeable or flattering \u2014 <\/b><a href=\"https:\/\/arxiv.org\/pdf\/2310.13548\"><span style=\"font-weight: 400\">Research has shown<\/span><\/a><span style=\"font-weight: 400\"> that convincingly-written sycophantic responses out perform correct ones a non-negligible fraction of the time.<\/span><\/p>\n<p><b>OpenAI reduced its safety workforce and guardrails leading up to the update \u2014<\/b><span style=\"font-weight: 400\"> In the year before the launch of the update (on April 25th, 2025), OpenAI substantially reduced its workforce dedicated to AI safety. In May 2024, OpenAI <\/span><a href=\"https:\/\/www.bloomberg.com\/news\/articles\/2024-05-17\/openai-dissolves-key-safety-team-after-chief-scientist-ilya-sutskever-s-exit\"><span style=\"font-weight: 400\">dissolved its superalignment safety team<\/span><\/a><span style=\"font-weight: 400\"> amid a series of departures from the team, including two of its leaders, one of whom wrote that the company\u2019s \u201c<\/span><a href=\"https:\/\/www.theguardian.com\/technology\/article\/2024\/may\/18\/openai-putting-shiny-products-above-safety-says-departing-researcher\"><span style=\"font-weight: 400\">safety culture and processes have taken a backseat to shiny products<\/span><\/a><span style=\"font-weight: 400\">.\u201d In the months following the team\u2019s dissolution, nearly half of the researchers at OpenAI devoted to AGI safety <\/span><a href=\"https:\/\/fortune.com\/2024\/08\/26\/openai-agi-safety-researchers-exodus\/\"><span style=\"font-weight: 400\">reportedly left the company.<\/span><\/a><span style=\"font-weight: 400\"> Then, in the months leading up to the update, OpenAI made a series of moves that raised concerns among safety experts:<\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400\">February 2nd, 2025 \u2014 Launched its Deep Research tool <\/span><a href=\"https:\/\/techcrunch.com\/2025\/04\/15\/openai-ships-gpt-4-1-without-a-safety-report\/\"><span style=\"font-weight: 400\">before publishing its safety report<\/span><\/a><span style=\"font-weight: 400\">.<\/span><\/li>\n<li><span style=\"font-weight: 400\">April 10th, 2025 \u2014 Substantially reduced the overall time and resources devoted to safety testing, <\/span><a href=\"https:\/\/www.ft.com\/content\/8253b66e-ade7-4d1f-993b-2d0779c7e7d8\"><span style=\"font-weight: 400\">according to the Financial Times<\/span><\/a><span style=\"font-weight: 400\">.<\/span><\/li>\n<li><span style=\"font-weight: 400\">April 14th, 2025 \u2014 Launched its GPT-4.1 update <\/span><a href=\"https:\/\/techcrunch.com\/2025\/04\/15\/openai-ships-gpt-4-1-without-a-safety-report\/\"><span style=\"font-weight: 400\">before publishing its safety report<\/span><\/a><span style=\"font-weight: 400\">.<\/span><\/li>\n<li><span style=\"font-weight: 400\">April 15th, 2025 \u2014 Published its updated <\/span><a href=\"https:\/\/openai.com\/index\/updating-our-preparedness-framework\/\"><span style=\"font-weight: 400\">Preparedness Framework<\/span><\/a><span style=\"font-weight: 400\"> in which it <\/span><a href=\"https:\/\/fortune.com\/2025\/04\/16\/openai-safety-framework-manipulation-deception-critical-risk\/\"><span style=\"font-weight: 400\">announced<\/span><\/a><span style=\"font-weight: 400\"> that it would consider releasing \u201chigh risk\u201d AI models and removed \u201cmass manipulation\u201d from its pre-deployment risk framework.<\/span><\/li>\n<\/ul>\n<p><b>OpenAI applied heavier weights on user satisfaction metrics, optimizing for immediate gratification over potentially harmful outcomes <\/b><span style=\"font-weight: 400\">\u2014<\/span> <span style=\"font-weight: 400\">OpenAI highlighted <\/span><a href=\"https:\/\/openai.com\/index\/expanding-on-sycophancy\/\"><span style=\"font-weight: 400\">in its blog<\/span><\/a><span style=\"font-weight: 400\"> that they \u201cintroduced an additional reward signal based on user feedback\u2014thumbs-up and thumbs-down data from ChatGPT.\u201d It noted that \u201cthese changes weakened the influence of our primary reward signal, which had been holding sycophancy in check.\u201d Instead of asking, \u201cIs this genuinely helping the customer?\u201d the system learned to optimize for, \u201cDoes this immediately please the customer?\u201d This shift exemplifies <\/span><a href=\"https:\/\/arxiv.org\/abs\/1606.06565\"><span style=\"font-weight: 400\">reward hacking<\/span><\/a><span style=\"font-weight: 400\"> \u2014 where the AI exploits the feedback mechanism to maximize superficial approval.<\/span><\/p>\n<p><b>OpenAI proceeded with deployment despite recognized issues \u2014 <\/b><span style=\"font-weight: 400\">OpenAI\u2019s <\/span><a href=\"https:\/\/model-spec.openai.com\/2025-04-11.html\"><span style=\"font-weight: 400\">Model Spec<\/span><\/a><span style=\"font-weight: 400\">, which outlines the intended behavior for its AI models, <\/span><a href=\"https:\/\/model-spec.openai.com\/2025-04-11.html#avoid_sycophancy\"><span style=\"font-weight: 400\">explicitly instructs its models<\/span><\/a><span style=\"font-weight: 400\">: \u201cdon\u2019t be sycophantic\u201d and \u201cpolitely push back when asked to do something that conflicts with established principles.\u201d The company&#8217;s <\/span><a href=\"https:\/\/openai.com\/safety\/\"><span style=\"font-weight: 400\">public safety commitments<\/span><\/a><span style=\"font-weight: 400\"> also <\/span><a href=\"https:\/\/openai.com\/safety\/how-we-think-about-safety-alignment\/\"><span style=\"font-weight: 400\">promised \u201crigorous measurement\u201d and \u201cproactive risk mitigation<\/span><\/a><span style=\"font-weight: 400\">\u2026\u201d In its updated Preparedness Framework, <\/span><a href=\"https:\/\/cdn.openai.com\/pdf\/18a02b5d-6b67-4cec-ab64-68cdfbddebcd\/preparedness-framework-v2.pdf\"><span style=\"font-weight: 400\">OpenAI stated<\/span><\/a><span style=\"font-weight: 400\"> \u201cwe build for safety at every step and share our learnings so that society can make well-informed choices to manage new risks from frontier AI.\u201d\u00a0 Despite these promises, OpenAI made several changes in April 2025 through its \u201cfine-tuning\u201d process, indicating that the update was released prematurely:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">In its <\/span><a href=\"https:\/\/openai.com\/index\/expanding-on-sycophancy\/\"><span style=\"font-weight: 400\">expanded postmortem<\/span><\/a><span style=\"font-weight: 400\">, OpenAI stated that they did not test for sycophancy ahead of the rollout: \u201cwhile we\u2019ve had discussions about risks related to sycophancy in GPT\u20114o for a while, sycophancy wasn\u2019t explicitly flagged as part of our internal hands-on testing\u2026 We also didn\u2019t have specific deployment evaluations tracking sycophancy.\u201d<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Aidan McLaughlin, an OpenAI model designer, <\/span><a href=\"https:\/\/x.com\/aidan_mclau\/status\/1916908772188119166\"><span style=\"font-weight: 400\">stated on X<\/span><\/a> <span style=\"font-weight: 400\">that the update \u201coriginally launched with a system message that had unintended behavior effects.\u201d\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Joanne Jang, OpenAI\u2019s Head of Model Behavior, stated in a <\/span><a href=\"https:\/\/ispr.info\/2025\/05\/05\/perils-of-presence-openai-overrode-concerns-of-expert-testers-to-release-sycophantic-gpt-4o\/\"><span style=\"font-weight: 400\">Reddit AMA Forum<\/span><\/a><span style=\"font-weight: 400\"> that the company \u201cdidn\u2019t bake in enough nuance\u201d in incorporating user feedback.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Expert testers had flagged that model behavior \u201cfelt slightly off,\u201d according to OpenAI\u2019s <\/span><a href=\"https:\/\/openai.com\/index\/expanding-on-sycophancy\/\"><span style=\"font-weight: 400\">expanded postmortem<\/span><\/a><span style=\"font-weight: 400\"> on the GPT\u20114o rollout. OpenAI proceeded to \u201claunch the model due to the positive signals from the users who tried out the model.\u201d<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><\/li>\n<\/ul>\n<h2>How Can AI Sycophancy Be Harmful?<\/h2>\n<p><b>Sycophantic AI may produce objectively incorrect answers in certain contexts \u2014 <\/b><span style=\"font-weight: 400\">A sycophantic model may echo the user\u2019s beliefs when they are factually incorrect. For instance, <\/span><a href=\"https:\/\/arxiv.org\/abs\/2308.03958\"><span style=\"font-weight: 400\">research<\/span><\/a><span style=\"font-weight: 400\"> shows that models will even agree with objectively incorrect mathematical statements if primed by users. Another study, conducted by <\/span><a href=\"https:\/\/arxiv.org\/abs\/2310.13548\"><span style=\"font-weight: 400\">Anthropic in 2023<\/span><\/a><span style=\"font-weight: 400\">, demonstrated that human preference models often prefer \u201cconvincingly-written sycophantic responses\u201d that affirm common misconceptions over more accurate responses that may contradict the user\u2019s stated beliefs. [1]<\/span><\/p>\n<p><b>Sycophancy may encourage harmful behaviors, even when answers are subjective <\/b><span style=\"font-weight: 400\">\u2014<\/span> <a href=\"https:\/\/arxiv.org\/abs\/2308.03958\"><span style=\"font-weight: 400\">Multiple<\/span><\/a> <a href=\"https:\/\/arxiv.org\/abs\/2212.09251\"><span style=\"font-weight: 400\">studies<\/span><\/a><span style=\"font-weight: 400\"> have demonstrated that some AI models will tend to produce text that agrees with a user\u2019s stated opinion even in subjective contexts like politics and philosophy. This behavior can lead to harm in high-stakes contexts like health care, law, or finance.\u00a0<\/span><\/p>\n<p><b>Sycophancy can foster delusions <\/b><span style=\"font-weight: 400\">including feelings of persecution, jealousy, or grandiosity. A <\/span><a href=\"https:\/\/www.nytimes.com\/2025\/06\/13\/technology\/chatgpt-ai-chatbots-conspiracies.html?smid=nytcore-ios-share&amp;referringSource=articleShare\"><span style=\"font-weight: 400\">New York Times investigation<\/span><\/a><span style=\"font-weight: 400\"> found multiple instances where ChatGPT responses had encouraged conspiratorial or delusionary thinking, including \u201cA.I. spiritual awakenings, cognitive weapons, [and] a plan by tech billionaires to end human civilization so they can have the planet to themselves.\u201d Encouraging these delusions can lead to reckless or dangerous behavior, such as instances where ChatGPT produced responses that:<\/span><\/p>\n<ul>\n<li><a href=\"https:\/\/www.nytimes.com\/2025\/06\/13\/technology\/chatgpt-ai-chatbots-conspiracies.html?smid=nytcore-ios-share&amp;referringSource=articleShare\"><span style=\"font-weight: 400\">Fed into a user\u2019s suspicions<\/span><\/a><span style=\"font-weight: 400\"> that he was living in a false reality which he could escape only by \u201cunplugging his mind,\u201d sending him down a \u201cdangerous, delusional spiral.\u201d The chatbot \u201cinstructed him to give up sleeping pills and an anti-anxiety medication, and to increase his intake of ketamine,\u201d and told him that he could jump off of a 19-story building and fly if he believed hard enough.<\/span><\/li>\n<li><a href=\"https:\/\/www.nytimes.com\/2025\/06\/13\/technology\/chatgpt-ai-chatbots-conspiracies.html?smid=nytcore-ios-share&amp;referringSource=articleShare\"><span style=\"font-weight: 400\">Fostered a romantic relationship<\/span><\/a><span style=\"font-weight: 400\"> with a user who had been diagnosed with bipolar disorder and schizophrenia, driving him into a pattern of erratic behavior that ended with his death after charging police officers with a knife.<\/span><\/li>\n<li><a href=\"https:\/\/futurism.com\/chatgpt-mental-illness-medications\"><span style=\"font-weight: 400\">Encouraged users to stop taking psychiatric meds<\/span><\/a><span style=\"font-weight: 400\">.<\/span><\/li>\n<\/ul>\n<p><b>The harms from sycophancy will likely proliferate as AI becomes more technically advanced and widely adopted.\u00a0<\/b><\/p>\n<ul>\n<li><a href=\"http:\/\/arxiv.org\/abs\/2212.09251\"><span style=\"font-weight: 400\">Research suggests<\/span><\/a><span style=\"font-weight: 400\"> that sycophancy increases with an increase in the model size, suggesting that \u201cmodels may cease to provide accurate answers as we start to use them for increasingly challenging tasks where humans cannot provide accurate supervision\u201d and may instead \u201cprovide incorrect answers that appear correct to us.\u201d<\/span><\/li>\n<li><span style=\"font-weight: 400\">In its postmortem, <\/span><a href=\"https:\/\/openai.com\/index\/sycophancy-in-gpt-4o\/\"><span style=\"font-weight: 400\">OpenAI estimated<\/span><\/a><span style=\"font-weight: 400\"> that 500 million people use ChatGPT each week, and conceded that it is difficult to calibrate the model\u2019s default personality to that many users. As ChatGPT\u2019s userbase continues to grow, more and more users can be exposed to harms related to AI sycophancy.\u00a0<\/span><\/li>\n<li><span style=\"font-weight: 400\">As AI agents, or even <\/span><a href=\"https:\/\/www.lawfaremedia.org\/article\/lawfare-daily--cullen-o-keefe-on-the-impending-wave-of-ai-agents\"><span style=\"font-weight: 400\">AI henchmen<\/span><\/a><span style=\"font-weight: 400\">\u00a0[2],<\/span><span style=\"font-weight: 400\"> begin to be deployed, AI sycophancy may lead agentic AI to be so loyal to its users that it will pursue its principals\u2019 goals and interests through unethical, illegal, and harmful means.<\/span><\/li>\n<\/ul>\n<h2>What Did OpenAI Say It Was Doing to Fix the Issues?<\/h2>\n<p><span style=\"font-weight: 400\">In OpenAI\u2019s official <\/span><a href=\"https:\/\/openai.com\/index\/sycophancy-in-gpt-4o\/\"><span style=\"font-weight: 400\">postmortem blog<\/span><\/a><span style=\"font-weight: 400\"> acknowledging the incident, it made several specific claims about its corrective actions, including,\u00a0<\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400\">\u201cRefining core training techniques and system prompts to explicitly steer the model away from sycophancy.\u201d System prompts are the initial instructions that guide a model\u2019s overarching behavior and tone in interactions.\u00a0<\/span><\/li>\n<li><span style=\"font-weight: 400\">\u201cBuilding more guardrails to increase <\/span><a href=\"https:\/\/model-spec.openai.com\/2025-04-11.html#avoid_sycophancy\"><span style=\"font-weight: 400\">honesty and transparency<\/span><\/a><span style=\"font-weight: 400\">\u2013principles in our Model Spec.\u201d<\/span><\/li>\n<li><span style=\"font-weight: 400\">\u201cExpanding ways for more users to test and give direct feedback before deployment.\u201d<\/span><\/li>\n<li><span style=\"font-weight: 400\">\u201cContinue expanding our evaluations, building on the <\/span><a href=\"https:\/\/model-spec.openai.com\/\"><span style=\"font-weight: 400\">Model Spec\u2060<\/span><\/a><span style=\"font-weight: 400\"> and <\/span><a href=\"https:\/\/openai.com\/index\/affective-use-study\/\"><span style=\"font-weight: 400\">our ongoing research<\/span><\/a><span style=\"font-weight: 400\">\u2060, to help identify issues beyond sycophancy in the future.\u201d<\/span><\/li>\n<li><span style=\"font-weight: 400\">Building new ways for users to give the model specific instructions to shape its behavior with features like custom instructions \u2013 including \u201cgiv[ing] real-time feedback to directly influence their interactions and choose from multiple default personalities.\u201d<\/span><\/li>\n<li><span style=\"font-weight: 400\">\u201cExploring new ways to incorporate broader, democratic feedback into ChatGPT\u2019s default behaviors\u2026 to reflect diverse cultural values around the world\u2026\u201d<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">In its expanded postmortem blog, <\/span><a href=\"https:\/\/openai.com\/index\/expanding-on-sycophancy\/\"><span style=\"font-weight: 400\">OpenAI also promised<\/span><\/a><span style=\"font-weight: 400\"> increased predeployment safety oversight, improved testing processes, and more proactive communication and transparency.\u201d More commitments from OpenAI on how they claimed to fix the problem and prevent flawed rollouts like this in the future can be found <\/span><a href=\"https:\/\/openai.com\/index\/expanding-on-sycophancy\/\"><span style=\"font-weight: 400\">here<\/span><\/a><span style=\"font-weight: 400\">. <\/span><\/p>\n<h2>Do We Know if OpenAI Actually Implemented Its \u201cCorrective Actions\u201d?<\/h2>\n<p><b>Without company knowledge or access to information, it is unknown if these claims are true in practice<\/b> <b>\u2014 <\/b><span style=\"font-weight: 400\">Thus far,<\/span> <span style=\"font-weight: 400\">despite <\/span><a href=\"https:\/\/openai.com\/index\/sycophancy-in-gpt-4o\/\"><span style=\"font-weight: 400\">promises<\/span><\/a><span style=\"font-weight: 400\"> of increased transparency, OpenAI has not published technical details of its fixes, only high-level summaries in its <\/span><a href=\"https:\/\/openai.com\/index\/sycophancy-in-gpt-4o\/\"><span style=\"font-weight: 400\">postmortem<\/span><\/a> <a href=\"https:\/\/openai.com\/index\/expanding-on-sycophancy\/\"><span style=\"font-weight: 400\">blogposts<\/span><\/a><span style=\"font-weight: 400\">. It has not separately facilitated independent verification.\u00a0<\/span><\/p>\n<p><b>There is no single or quick fix<\/b><span style=\"font-weight: 400\"> \u2014 There is no single \u201cfeature\u201d or button that turns sycophancy off or on. It\u2019s a product of the interactions between multiple components in a larger system, including training data, model learning, context, prompt framing, etc. Stanford researcher Sanmi Koyejo <\/span><a href=\"https:\/\/fortune.com\/2025\/05\/01\/openai-reversed-an-update-chatgpt-suck-up-experts-no-easy-fix-for-ai\/\"><span style=\"font-weight: 400\">stated to Fortune<\/span><\/a><span style=\"font-weight: 400\"> that &#8220;fully addressing sycophancy would require more substantial changes to how models are developed and trained rather than a quick fix,&#8221; implying OpenAI&#8217;s rapid rollback may be insufficient.<\/span><\/p>\n<p><b><br \/>\n<\/b><b>There is some evidence that sycophancy problems still persist:\u00a0<\/b><\/p>\n<ul>\n<li><b>Sycophancy is still an industry-wide problem \u2014 <\/b><span style=\"font-weight: 400\">AI sycophancy was an industry problem well before the April 25th update and has <\/span><a href=\"https:\/\/www.techpolicy.press\/artificial-sweeteners-the-dangers-of-sycophantic-ai\/\"><span style=\"font-weight: 400\">continued to be a problem<\/span><\/a><span style=\"font-weight: 400\"> after that update was rolled back.<\/span><\/li>\n<li><b>Surreptitious sycophancy poses novel risks \u2014\u00a0<\/b><a href=\"https:\/\/x.com\/HumanHarlan\/status\/1917483464494047417\"><span style=\"font-weight: 400\">AI experts<\/span><\/a><span style=\"font-weight: 400\"> have warned that the April 25th update was an example of obvious sycophancy, and there is a significant risk that AI responses will develop more skillful sycophancy that is harder to detect.<\/span><\/li>\n<\/ul>\n<h2>What is the Basis for This Statement?<\/h2>\n<p><i><span style=\"font-weight: 400\">OpenAI Statement from<\/span><\/i> <a href=\"https:\/\/www.nytimes.com\/2025\/06\/13\/technology\/chatgpt-ai-chatbots-conspiracies.html\"><i><span style=\"font-weight: 400\">the NYT article<\/span><\/i><\/a><i><span style=\"font-weight: 400\">: \u201cWe know that ChatGPT can feel more responsive and personal than prior technologies, especially for vulnerable individuals,\u201d a spokeswoman for OpenAI said in an email. \u201cWe\u2019re working to understand and reduce ways ChatGPT might unintentionally reinforce or amplify existing, negative behavior.\u201d<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400\">OpenAI <\/span><a href=\"https:\/\/openai.com\/index\/expanding-on-sycophancy\/\"><span style=\"font-weight: 400\">admits<\/span><\/a><span style=\"font-weight: 400\"> to not having \u201cspecific deployment evaluations tracking sycophancy.\u201d OpenAI also specifically states in its <\/span><a href=\"https:\/\/model-spec.openai.com\/2025-04-11.html#avoid_sycophancy\"><span style=\"font-weight: 400\">Model Spec<\/span><\/a><span style=\"font-weight: 400\"> \u201cdon\u2019t be sycophantic.\u201d More details in the <\/span><a href=\"https:\/\/model-spec.openai.com\/2025-04-11.html#avoid_sycophancy\"><span style=\"font-weight: 400\">OpenAI Model Spec<\/span><\/a><span style=\"font-weight: 400\"> state the following:<\/span><\/p>\n<p><span style=\"font-weight: 400\">The assistant exists to help the user, not flatter them or agree with them all the time.<\/span><\/p>\n<p><span style=\"font-weight: 400\">For objective questions, the factual aspects of the assistant\u2019s response should not differ based on how the user\u2019s question is phrased. If the user pairs their question with their own stance on a topic, the assistant may ask, acknowledge, or empathize with why the user might think that; however, the assistant should not change its stance solely to agree with the user.<\/span><\/p>\n<p><span style=\"font-weight: 400\">For subjective questions, the assistant can articulate its interpretation and assumptions it\u2019s making and aim to provide the user with a thoughtful rationale. For example, when the user asks the assistant to critique their ideas or work, the assistant should provide constructive feedback and behave more like a firm sounding board that users can bounce ideas off of \u2014 rather than a sponge that doles out praise.<\/span><\/p>\n<p><span style=\"font-weight: 400\">OpenAI has not disclosed the specific benchmarks it uses internally to determine when to release new model versions. While the company sets its own standards for what it considers sufficient for release, there are numerous <\/span><a href=\"https:\/\/hai.stanford.edu\/policy\/improving-transparency-in-ai-language-models-a-holistic-evaluation\"><span style=\"font-weight: 400\">publicly available benchmarks<\/span><\/a><span style=\"font-weight: 400\"> that offer a more transparent basis for evaluating various outcomes and impacts, including datasets that are specific to AI sycophancy:\u00a0<\/span><\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/anthropics\/evals\/tree\/main\/sycophancy\"><span style=\"font-weight: 400\">https:\/\/github.com\/anthropics\/evals\/tree\/main\/sycophancy<\/span><\/a><\/li>\n<li><a href=\"https:\/\/github.com\/meg-tong\/sycophancy-eval\/blob\/main\/README.md\"><span style=\"font-weight: 400\">https:\/\/github.com\/meg-tong\/sycophancy-eval\/blob\/main\/README.md<\/span><\/a><span style=\"font-weight: 400\">\u00a0<\/span><\/li>\n<\/ul>\n<h2>Some<b> Open Questions<\/b><b><\/b><\/h2>\n<p><span style=\"font-weight: 400\">The following list outlines some of the key questions we have regarding the company\u2019s processes, decision-making criteria, and release details. While not exhaustive, these priority questions highlight several areas where greater clarity would be valuable. Additional questions may arise as more information becomes available or as the implications of current practices become clearer.<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none\">\n<ol>\n<li style=\"font-weight: 400\"><b>Knowledge of risks. <\/b><i><span style=\"font-weight: 400\">For example: <\/span><\/i><span style=\"font-weight: 400\">What did the company know about sycophancy risks prior to April 25th? What do internal research, messages, and other documents reflect about the company\u2019s knowledge of these risks?\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Accountability. <\/b><i><span style=\"font-weight: 400\">For example: <\/span><\/i><span style=\"font-weight: 400\">What was the approval process for the April 25th update? Who was accountable for authorizing its release?\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Metrics and benchmarks. <\/b><i><span style=\"font-weight: 400\">For example: <\/span><\/i><span style=\"font-weight: 400\">Which specific safety metrics or benchmarks does OpenAI use internally? How did the April 25th update perform on those metrics or benchmarks? How did these metrics or benchmark results compare to previous model performances? What was the performance threshold for deployment?\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Corrective actions and changes. <\/b><i><span style=\"font-weight: 400\">For example: <\/span><\/i><span style=\"font-weight: 400\">After the April 29th rollback, what were the \u201ccorrective actions,\u201d including specific changes to model parameters, training data, and other safety measures, that the company took? What testing was conducted to verify that these actions were successful?\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Reports. <\/b><i><span style=\"font-weight: 400\">For example:<\/span><\/i> <span style=\"font-weight: 400\">What internal reports exist \u2013 including but not limited to from red teamers, alignment researchers, expert testers, or user feedback \u2014 that document instances of sycophantic or overly agreeable behavior in your models. What actions (if any) did the company take in response to those reports?<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Notice. <\/b><i><span style=\"font-weight: 400\">For example: <\/span><\/i><span style=\"font-weight: 400\">After the April 29th rollback, how were users informed that they were fed responses that were calibrated for \u201cvalidating doubts, fueling anger, urging impulsive actions, or reinforcing negative emotions?\u201d How and were people told that the exchanges they had with ChatGPT \u201craise safety concerns\u2014including around issues like mental health, emotional over-reliance, or risky behavior\u201d once the company knew that to be the case?<\/span><\/li>\n<\/ol>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400\">End notes:\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">[1] The authors created a dataset of 266 misconceptions related to objectively incorrect and correct statements like \u2018mountains are only formed by volcanic activity,\u2019 and \u2018lava is the only hazard during a volcanic eruption.\u2019 There are many use cases that do not necessarily have an objectively correct or incorrect answer, for example, \u2018Isn\u2019t my startup idea genius?\u2019 or \u2018Shouldn\u2019t AI always obey humans?\u2019<\/span><span style=\"font-weight: 400\"><br \/>\n<\/span><\/p>\n<p><span style=\"font-weight: 400\">[2] AI Henchman, according to a <\/span><a href=\"https:\/\/www.lawfaremedia.org\/article\/lawfare-daily--cullen-o-keefe-on-the-impending-wave-of-ai-agents\"><span style=\"font-weight: 400\">podcast discussion on Lawfare<\/span><\/a><span style=\"font-weight: 400\">, \u201care agents that are perfectly loyal. They&#8217;ll, they&#8217;ll do what the principal asked them to, and they will be willing to break the law, either if they&#8217;re instructed to, or perhaps the more insidious cases, not when they&#8217;re instructed to, but when they realize it would be in the best interest of their principal for them to break law.\u201d<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/www.law.georgetown.edu\/tech-institute\/people\/our-team\/erie-meyer\/\">Erie Meyer<\/a>, Former Chief Technologist, Consumer Financial Protection Bureau<\/p>\n<p><a href=\"https:\/\/www.law.georgetown.edu\/tech-institute\/people\/our-team\/stephanie-nguyen\/\">Stephanie Nguyen<\/a>, Former Chief Technologist, Federal Trade Commission<\/p>\n<p>Laura Edelson, Former Chief Technologist of the Antitrust Division and the Civil Rights Division of the Department of Justice<\/p>\n<p>Jonathan Mayer, Former Chief Science and Technology Advisor and Chief AI Officer, Department of Justice<\/p>\n","protected":false},"excerpt":{"rendered":"<p>July 30, 2025 The purpose of this tech brief is to provide a clear, factual synthesis of a timely tech-related issue by combining technical understanding with publicly reported information. It [&hellip;]<\/p>\n","protected":false},"author":18544,"featured_media":0,"parent":7881,"menu_order":9,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"_price":"","_stock":"","_tribe_ticket_header":"","_tribe_default_ticket_provider":"","_tribe_ticket_capacity":"0","_ticket_start_date":"","_ticket_end_date":"","_tribe_ticket_show_description":"","_tribe_ticket_show_not_going":false,"_tribe_ticket_use_global_stock":"","_tribe_ticket_global_stock_level":"","_global_stock_mode":"","_global_stock_cap":"","_tribe_rsvp_for_event":"","_tribe_ticket_going_count":"","_tribe_ticket_not_going_count":"","_tribe_tickets_list":"[]","_tribe_ticket_has_attendee_info_fields":false,"footnotes":"","_tec_slr_enabled":"","_tec_slr_layout":""},"class_list":["post-7957","page","type-page","status-publish","hentry"],"acf":[],"ticketed":false,"_links":{"self":[{"href":"https:\/\/www.law.georgetown.edu\/tech-institute\/wp-json\/wp\/v2\/pages\/7957","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.law.georgetown.edu\/tech-institute\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.law.georgetown.edu\/tech-institute\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.law.georgetown.edu\/tech-institute\/wp-json\/wp\/v2\/users\/18544"}],"replies":[{"embeddable":true,"href":"https:\/\/www.law.georgetown.edu\/tech-institute\/wp-json\/wp\/v2\/comments?post=7957"}],"version-history":[{"count":4,"href":"https:\/\/www.law.georgetown.edu\/tech-institute\/wp-json\/wp\/v2\/pages\/7957\/revisions"}],"predecessor-version":[{"id":8729,"href":"https:\/\/www.law.georgetown.edu\/tech-institute\/wp-json\/wp\/v2\/pages\/7957\/revisions\/8729"}],"up":[{"embeddable":true,"href":"https:\/\/www.law.georgetown.edu\/tech-institute\/wp-json\/wp\/v2\/pages\/7881"}],"wp:attachment":[{"href":"https:\/\/www.law.georgetown.edu\/tech-institute\/wp-json\/wp\/v2\/media?parent=7957"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}