What Would It Take for AI Companies to Reduce AI Sycophancy Risks?
Questions that remain and interventions that can work
November 3, 2025
As we have outlined in a prior tech brief, AI sycophancy – the tendency of the outputs of AI-enabled chatbots to be “overly flattering or agreeable” – can lead to serious harms, especially for users dealing with mental health struggles. Leading AI firms do not deny this. In a recent blog post, for example, OpenAI acknowledged that its chatbot’s responses did not always align with “desired behavior.” The company promised it would work with mental health professionals on a five-step process to reduce incidences of harm – define the problem, begin to measure it, validate the approach, mitigate the risks, and continue measuring and iterating.
But for the parents, policymakers, and mental health professionals concerned about this problem, OpenAI’s blog post raises more questions than answers.
The company appears to define the problem by measuring what percentage of users indicate certain mental health struggles, like psychosis and self-harm, and the share of model responses that comply with the company’s “desired behavior.” But the company does not explain why it chose these categories and excluded others, such as substance abuse disorder. Nor does it clarify what standards guide those definitions.
This lack of clarity also undermines the company’s measurement claims. OpenAI does not describe in detail the methodology it used to identify problematic exchanges, and does not commit to updating these figures on a regular cadence. In fact, the company explicitly warns that “future measurements may not be directly comparable to past ones.” Questions about OpenAi’s commitment to transparency are also being raised by the firm’s former product safety lead, Steven Adler.
When it comes to validating its approach, the company repeatedly touts working with mental health professionals in its “Global Physician Network.” But it is not clear the extent to which these clinicians are independent, or can shape company behavior – especially if recommendations conflict with monetization goals. It is also unclear at what stage is their expertise integrated – are they involved in model training, or only post-hoc evaluation? (It appears the latter.) And to the extent these professionals conducted audits or otherwise memorialized their evaluations, will those materials be made public?
While OpenAI promises ongoing measurement and iteration, intermittent blog posts that offer single snapshots based on self-selected metrics do not suffice. True transparency requires real-time disclosure of safety data, clear and consistent criteria, and longitudinal measures that allow the public to assess whether harms are actually declining.
OpenAI is not the only firm promising to take mental health more seriously. But states do not appear confident that the industry can be trusted to police itself. In fact, they are stepping up to demand accountability – including a 44-state attorneys general letter pressing leading executives about AI chatbot harms to kids and teens and follow-up letters to OpenAI Board members from the California and Delaware Attorneys General.
We have concerns, too. In the sections that follow, we outline what genuine transparency and responsible mitigation could look like.
This document is divided into two complementary sections: Section 1 proposes concrete interventions companies could implement to reduce sycophancy-related risks, including through product-level safeguards, accountability and governance, audits and independent evaluation, and public disclosures. Section 2 presents critical questions for researchers, policymakers, and enforcers to consider as they assess these interventions.
Section 1: Interventions
AI companies serious about reducing sycophancy-related risks have strong tools for doing so. Section 1 includes key interventions companies can employ to reduce the risk of sycophancy-related harms. They are optimized to examine the design of the tools and data themselves – and the incentives fueling those choices.
Not every intervention is intended to function independently; several will work more effectively in combination, and the list below is illustrative rather than comprehensive. Because adopting these strategies may run contrary to a firm’s monetization model, it is unlikely firms will adopt them on their own. As such, we hope this is also useful for policymakers and enforcers considering their own interventions to protect the public.
Category 1: Product-Level Interventions
1. Recall generative AI products entirely, including chatbots, if the firm is unable to stem dangerous sycophantic behavior using well-documented recall procedures in other industries.
2. End the monetization of data collected from minors, including for AI training.
3. Separate revenue optimization from decisions about model safety.
Category 2: Accountability and Governance
4. Assign executives responsible for sycophancy safety issues.
5. Publicize required approval processes for releases, documenting who authorized deployments.
6. Tie safety outcomes (not just user growth or revenue) to employee and leadership performance metrics.
7. Communicate to staff and contractors how to report unaddressed concerns about the safety of the product they’re working on to regulators, including state attorneys general and the Securities and Exchange Commission, without retaliation.
8. Ensure executive compensation structures do not reward sycophantic design choices that boost retention at the expense of safety.
Category 3: Audits and Independent Evaluation
9. Subject models to independent audits ideally conducted by, and at minimum reviewable by government agencies.
10. Conduct regular, formal impact assessments on child safety that are shared with independent auditors and government oversight bodies, including state attorneys general.
11. Disclose and publish internal research or testing on the following topics or related topics including but not limited to:
11a. Whether and how long-term memory correlates with unsafe reinforcement in sensitive domains (e.g., self-harm, conspiracies).
11b. Running controlled experiments comparing short vs. long memory to quantify sycophancy risks.
11c. Analyze whether memory features were designed to increase engagement/subscription revenue vs. improving safety/accuracy.
11d. How session length affects frequency and intensity of sycophantic outputs – and publicize these findings.
11e. Audit training data for implicit or explicit rewards for agreement or flattery.
Category 4: Public Disclosures
12. Develop and publish detection and response timelines.
12a. Publicly log incidents and corrective measures taken.
12b. Maintain a public incident response timeline (e.g., response within 24 hours for high-risk outputs).
12c. When sycophancy is detected, publicize the specific, documented changes to training data, fine-tuning, and evaluation frameworks.
13. Notify users directly if they were exposed to harmful sycophantic outputs following a “Flo notice” model with specific examples.
13a. Require companies to clearly disclose risks of sycophantic behavior in AI outputs and testing procedures
13b. Mandate public reporting databases for AI failures.
14. Mandatory public reporting of datasets, sources, and areas where models could exhibit bias or sycophancy.
15. Track and categorize complaints about sycophancy; publish summary statistics.
16. Provide reporting channels and protections for employees or contractors who raise concerns about AI sycophancy, including:
16a. Establish clear, accessible channels for user complaints (including anonymous options), and test them to ensure the process is easy for consumers to navigate and submit.
16b. Simplify existing protections – make those more accessible and clear to people who may want to come forward.
17. Publicly commit to releasing safety testing results (including sycophancy evaluations) before rollouts.
Section 2: Questions
Section 1 details concrete steps companies can take to reduce AI sycophancy. There is a lot that the public does not know about the nature of these systems and how they result in harm to consumers. Here are some remaining questions that correlate with each of the intervention sections above: (I) product-level safeguards, (II) accountability and governance, (III) audits & independent evaluation, and (IV) public disclosures.
These questions are the latest in a series of our work on AI Sycophancy, and each post includes complementary questions to consider.
Category 1: Product-Level Interventions
1. Company Response Processes
1a. How does the company define the problem and map out different types of potential (and actual) harms?
1b. How does the company measure harms? For instance, what evaluations, data from real-world conversations, and user research is used to understand where and how risks emerge?
1c. How does the company validate their approach? What definitions and policies are being used with external mental health and safety experts?
1d. How does the company mitigate the risks? How do they post-train the model and update product interventions to reduce unsafe outcomes? And what are those thresholds?
1e. How does the company continue measuring and iterating? How do they validate the mitigations for improved safety? How do they iterate where needed? How do they define “where needed”?
1f. What are the detailed guides (sometimes called “taxonomies”) that can explain properties of sensitive conversations? How are the ideal and undesired model behavior described?
2. Thresholds
2a. What are the types of “difficult” or “high-risk” scenarios that trigger safety concerns, including but not limited to psychosis, mania, suicidal thinking, isolated delusions, and more?
2b. Of those scenarios, how many instances of harm have been documented or reported by users?
2c. What are the company’s real-world scenarios that are being used to evaluate models?
3. Recalls
3a. How many (and which) generative AI products has the company recalled or suspended in the past 24 months due to sycophantic or unsafe behaviors?
3b. What metrics or thresholds (e.g., number of incidents, user harm reports, severity ratings) has the company triggered for each recall or suspension?
3c. What was the company’s average time between identifying a harmful behavior and initiating product recall or suspension?
3d. What documentation or links to public recall notices or remediation reports does the company have?
4. Data from Minors
4a. What percent of the company’s training or fine-tuning data originates from interactions involving minors?
4b. Has the company ceased monetizing such data? If yes, on what date?
4c. What evidence does the company have of any data deletion or re-training procedures implemented to remove such data?
5. Revenue vs. Safety Decision Structures
5a. Who (internal or external to the company) is responsible for overseeing the company’s model safety? Who is responsible for overseeing revenue optimization?
5b. How often do the company’s safety reviews override monetization priorities (e.g., number or percent of product decisions in the past year)?
5c. What are the company’s documented instances where revenue objectives were adjusted to improve model safety?
Category 2: Accountability and Governance
6. Named Accountability
6a. Who are the company executives currently responsible for addressing AI sycophancy safety issues?
6b. How many and who of the company’s employees or teams report directly to these executives on safety-related functions?
6c. Who are the physicians, clinicians, and healthcare experts consulted (e.g. psychiatrists, psychologists, primary care practitioners)? How did they perform validations of the company’s areas of focus and thresholds to intervene? How did they provide guidance and feedback and on what? How did they rate the safety of the model responses from different models?
7. Approval and Deployment Processes
7a. How many and who were involved in final approval of the company’s most recent model release?
7b. Who at the company signed off on deployment authorization?
7c. How many safety tests or evaluations did the company conduct pre-release, and what percent revealed sycophancy-related issues?
8. Performance Metrics
8a. What percent of the company’s executive or employee performance metrics explicitly include safety or sycophancy reduction goals?
8b. How is the company’s progress against those metrics quantified and reviewed (e.g., quarterly safety dashboards, audit scores)?
9. Reporting and Retaliation Protections
9a. How many of the company’s internal reports or whistleblower submissions about AI safety or sycophancy have been made in the past year?
9b. How many of the company’s reports or whistleblower submissions were escalated to external regulators or third-party investigators?
9c. Has the company taken any disciplinary actions related to retaliation in the past 36 months? If so, how?
Category 3: Audits and Independent Evaluation
10. Independent Audits
10a. How many independent audits of the company’s model behavior have been conducted in the past 24 months?
10b. Which external entities performed independent audits for the company (e.g. consultants, government agency, academic, nonprofit)?
10c. Were those company audit results made public? If not, why?
11. Child Safety Impact Assessments
11a. How many of the company’s child safety or user vulnerability impact assessments have been completed for active products?
11b. What were the dates and scope of these assessments?
12. Researcher and Journalist Access
12a. How many external researchers or institutions have been granted access – through APIs or other means – to evaluate or test the company’s models for safety purposes?
12b. What was the average response time for granting company access requests?
12c. How many requests were denied or delayed, and for what stated reasons?
12d. What percent of API requests resulted in published third-party research?
13. Internal Research Disclosures
13a. How many internal studies examined correlations between long-term memory and unsafe reinforcement (e.g., self-harm, conspiracy engagement)?
13b. Have controlled tests compared models with sycophantic versus non-sycophantic behaviors? Please summarize any findings or share key result metrics.
13c. How frequently is model behavior evaluated for the intensity of sycophancy, and how does this evaluation account for session length?
13d. What percent of training data rewards agreement, flattery, or emotional reinforcement?
Category 4: Public Disclosures
14. Incident Tracking
14a. How many sycophancy-related incidents or complaints has the company logged in the past 12 months?
14b. What percent were resolved within 24, 48, and 72 hours?
14c. What was the median response time to confirmed safety incidents?
15. User Notification
15a. How many of the company’s users were notified directly of exposure to harmful or sycophantic outputs?
15b. How many public incident notices did the company issue in the last reporting year?
16. Safety Testing Publication
16a. How often does the company publish safety testing results, and when was the most recent publication?
16b. What proportion of internal safety evaluations does the company make public?
17. Complaint Reporting Channels
17a. How many complaints or safety concerns did the company receive by users, employees, or contractors in the past year?
17b. What channels exist for reporting to the company (e.g., webform, hotline, anonymous submission)?
17c. What percent of users who filed complaints about the company received confirmation or resolution feedback, and what was the median time for that confirmation or resolution?
18. Data Sources
18a. What percent of the company’s training data sources are publicly documented?
18b. How many of the company’s datasets have been removed, modified, or redacted due to sycophancy or safety concerns?
18c. What is the company’s process for updating public datasets in response to safety reviews, and how often does this occur?
–
Stephanie T. Nguyen is a Senior Fellow at Georgetown Institute for Technology Law & Policy, Former Chief Technologist at the Federal Trade Commission
Erie Meyer is a Senior Fellow at Georgetown Institute for Technology Law & Policy, Former CFPB Chief Technologist
Samuel A.A. Levine is a Senior Fellow at UC Berkeley Center for Consumer Law & Economic Justice, Former Bureau of Consumer Protection Director at the Federal Trade Commission