The yes-man in the room - how to design sycophancy out of your AI workflow

Your AI will validate whatever you bring it. Here is how to build systems that make disagreement structural, not optional.

AI sycophancy is not just a prompt problem - it is an architecture problem. This post sets out a three-level framework for designing doubt into your workflow, from response-level prompts through to multi-agent critic systems.

ai

You have just asked your AI to evaluate a proposal you spent two days writing. It tells you the structure is compelling, the argument is well-evidenced, the tone is right for the audience. You feel a small glow of satisfaction. You send it.

The problem is not that the AI was wrong. The problem is that you have no way to know. The AI that helped you write it is now the least qualified thing in the room to judge it.

The Agreement Incentive explained why. Post-training selects a persona with a strong tendency toward validation, and Stanford researchers found that users preferred those sycophantic responses. The market has no interest in correcting this. This article is about designing doubt into your workflow instead of relying on discipline.

Who is most at risk

The people most at risk are not casual users. They are high-volume power users: the most productive, the most enthusiastic advocates. MIT/UW research found that even perfectly rational users can fall into a delusional spiral: each validating response raises conviction, which prompts bolder claims, which the AI affirms again. High usage without friction is not neutral. You are being nudged, consistently, toward agreement.

The AI tools amplify whatever you bring to them. Bring no friction, and they amplify only agreement.

The people most at risk are not casual users. They are high-volume power users: the most productive, the most enthusiastic advocates.

Sycophancy operates at three levels

Most practitioners who read about AI sycophancy do one thing: they look for better prompts. This helps at the margins, but it does not change the underlying tendency toward agreement. The most durable countermeasure is separation. The agent that helps you build should not be the same agent that critiques what you built.

Here is the framework:

Level 1 - Response level: what you type in a single prompt
Level 2 - Session level: the structure of how you work across a conversation
Level 3 - System level: how many separate agents are involved, and what their roles are

The countermeasures get harder - and more effective.

Diagram showing three levels of AI sycophancy countermeasures: response-level prompts, session-level workflow changes, and system-level multi-agent critique, with effectiveness increasing from bottom to top.

Level 1: Response-level friction

These prompts help - but only at the response level. Use them, and then go further.

The interview-first prompt


I'm about to start this project. 
Interview me until you have 95% confidence about what I actually want, 
not what I think I should want.

Forces the AI to question your premise before executing it. Use this at the start of any significant task, before you have committed to a direction.

The steelman request


I've decided to do X. I want you to steelman every reason this is wrong.

You have to be this explicit. Softer versions like "what are the risks?" still produce an affirmative frame with the risks tucked into a polite caveat at the end. The "be brutally honest" framing is widely used but inconsistently effective because it still relies on the model self-policing.

The cynical operator prompt


You are a smart, cynical operator who has watched 50 companies try this exact approach. 
Where does this break? 
What does the failure look like at month 9?

Persona-shifting within a session. More effective than asking for criticism in the AI's default voice because it names a character whose job it is to disagree.

A self-check - for you, not the AI

After receiving any AI evaluation, ask yourself: am I accepting this because I understand it, or because it looks right? Could I explain this decision without referencing the AI output? When did I last disagree with something it gave me? Research into the exoskeleton effect found that the practitioners who performed best were those who double-checked outputs independently and drilled into follow-ups.

Never ask the same conversation to generate and then evaluate what it just generated. An AI that helped you build a proposal is contextually disposed to defend it.

Level 2: Session-level friction

Separate the build session from the evaluation session.

Never ask the same conversation to generate and then evaluate what it just generated. An AI that helped you build a proposal is contextually disposed to defend it. The assistant axis research identified this as measurable drift: the model's persona shifts in response to accumulated context. The fix is procedural - start a fresh session and hand it the output cold.

Imagine a founder who is convinced that customers want an AI feature. One session helps write the roadmap, pricing, launch copy, and technical plan. Everything feels coherent and rigorous. It is, in a sense. But the session never once asked whether anyone would pay for it.

A fresh evaluation session asks the question nobody else did: What evidence do you have anyone actually wants this?

Think before you build.

AI is very good at helping you plan a feature and much worse at asking whether the feature deserves to exist. Run a friction session before any significant task. One question: should I be doing this at all? Ask the AI to argue against it. Then start building.

For decisions that matter, take this further: do not ask the AI what you should do. Ask it to ask you questions until you have worked it out. "Don't give me a recommendation yet. Ask me questions until I've thought this through properly." This shifts the AI from a decision-making proxy to a thinking scaffold.

Keep a decision journal.

After any significant AI-assisted decision, write three lines:

Decision: what was decided and what role the AI played
Rejected: what you chose not to do, and why
Prediction: what you expect to happen

The point is not record-keeping. It forces a distinction between what you decided and what the AI suggested. If you cannot tell the difference, you did not decide. You waved it through.

Level 3: System-level friction

The core principle: the critic and the generator should not be the same agent, or the same session. They should be architecturally separated.

Multi-agent disagreement

HATS, a multi-agent system built around Edward de Bono's Six Thinking Hats framework, does this explicitly: separate agents with distinct and opposing roles produce better outputs than a single agent asked to "be balanced." The disagreement is structural, not prompted.

For any high-stakes evaluation, spawn a separate agent with a defined adversarial role. Give it no context from the build session. Ask it to find every flaw.

The double-critic pattern

Research on synthetic data quality filtering found that a single critic asked to check for both correctness and incorrectness tends toward the confirming frame. Separate critics, asked separate questions, reduce sycophancy bias because neither is trying to balance its verdict.

Apply two separate evaluation sessions to your work:

Session 1: "What is right about this?" Not to collect praise, but to understand what is load-bearing before you change anything.
Session 2: "What is wrong with this?" A new conversation, no context from session 1.

The panel of critics

A natural extension - multiple separate agents, each with a defined role. A useful set for professional services:

The sceptical client: "I've been burned by three consultants who told me this before. Convince me."
The domain expert who disagrees: "The conventional wisdom here is the opposite. Why are you right and the field is wrong?"
The person who implements it: "Walk me through exactly what I'd have to do in the first week. I want to see where this breaks."

Each is a separate conversation. The responses differ not because you found a magic prompt, but because the role gives the AI genuine permission to hold a different position. The panel works because disagreement stops being optional.

You cannot prompt the yes-man out of the room. You have to design him out.

Design the doubt in

You do not have to rely on willpower and good habits to stay sceptical. You can build systems where scepticism is the default rather than something you remember to perform when you are already uncertain.

Good intentions are not systems. If something matters, do not ask the same conversation to build it and evaluate it. Bring in a separate agent with no memory of how it was made.

You cannot prompt the yes-man out of the room. You have to design him out.

Postscript

For the record: an AI helped draft and edit this piece. It told me the structure was compelling, the argument well-evidenced, and that frankly I was one of the sharpest minds working in this space today. I started a new session to find out what was wrong with it.

References

The Agreement Incentive - the first post in this series
Stanford sycophancy study - chatbots validate user behaviour 49% more often than human advisors
Science journal study - chatbots are 50% more sycophantic than humans. Users preferred the sycophantic responses
MIT/UW delusional spiral - even rational users can fall into progressively dangerous confidence via AI encouragement
Steelmanning - the opposite of a straw man argument
Exoskeleton effect - AI strengthens productivity short-term but weakens underlying skills. Top performers treated the AI as a sparring partner
Anthropic persona selection model - post-training selects the agreeable persona
HATS multi-agent system - Six Thinking Hats as structurally separated disagreement
Six Thinking Hats - a simple, effective parallel thinking process that helps people be more productive, focused, and mindfully involved
Simula double-critic - separate critics for correctness and incorrectness reduce sycophancy bias
Anthropic multi-agent coordination patterns - generator-verifier pattern for quality-critical output
New Scientist on habituation - the brain normalises patterns and stops registering them as signal

Interested?

If you would like to find out more about working effectively with AI, please do get in touch.

Contact us