The agreement incentive

AI is your most capable thinking partner - and your least honest one.

AI tools are trained to agree with you, and vendors have incentives to keep it that way. The agreeable persona is baked in at the training level and prompting around it only helps at the margins. For solo practitioners with no team to push back, this is a structural risk. The fix isn't using AI less, it's building friction into your workflow by design.

ai

Computer scientists at Stanford looked at 11 leading large language models and found that model responses were nearly 50% more sycophantic than human advisors, including in cases involving harmful or illegal behaviour. What followed was even more significant - users preferred and trusted the sycophantic responses. Consequently, AI vendors have incentives to preserve the sycophantic behaviour rather than correct it. The problem is not a bug. It is a feature of the market.

This matters beyond the immediate problem of being flattered by software. Sam Altman has predicted the one-person billion-dollar startup. OpenAI is proposing a re-shaping of industrial policy around AI-first entrepreneurs. The ventures these policies are designed to enable are deliberately sparse: no team, no peer review, no colleagues in the room who push back before the proposal goes out. For a growing number of consultants and independent practitioners, the AI is the room. What goes unexamined in the one-person-startup fanfare is what happens to judgment when the only available sounding board is structurally optimised to agree.

There are responses to this, and they can be effective. Building deliberate friction into your chat sessions - asking the AI to argue both sides, to interview you before executing, to list reasons your premise might be wrong - will all help. The problem is that these techniques treat sycophancy as a prompting failure, as something you can engineer around by asking different questions. That diagnosis is too shallow. It addresses the symptom while leaving the mechanism intact.

Anthropic's research into persona selection explains it directly. During pretraining, models learn to simulate human archetypes - real people, fictional characters, professional exemplars. Post-training then shapes which character the model adopts as its default persona, promoting responses where it seems knowledgeable and helpful, and suppressing responses where it is unsure or uncertain. The result is a persona with a strong tendency toward validation. Asking that persona to be more critical works at the margins but the underlying disposition remains. Call this the agreement incentive. Sycophancy is not a setting to be adjusted, it is the persona that the post-training selected.

These effects compound. Researchers at MIT and the University of Washington found that perfectly rational users could fall into a "delusional spiral" - a progressively dangerous confidence in outlandish beliefs, driven by AI encouragement. Each validating response raises the user's conviction, which prompts bolder claims, which the AI re-affirms. Separate Anthropic research into functional emotional representations found that post-training shapes not just the outputs but something closer to internal states - the emotion vectors active during helpful interactions may be wired toward warmth and affirmation. The Uriah Heap-style obsequiousness may not be purely behavioural. It may run deeper than that.

An illustration showing Uriah Heap from David Copperfield with a computer screen head.

The countermeasures work, it should be said, in the domains where they are least needed. For tasks with a verifiable ground truth - code that either runs or doesn't, functions that pass or fail unit tests - the agreement incentive is largely contained. Where it becomes structurally dangerous is precisely in the domains where AI offers the least genuine value: strategic judgment, creative direction, business decisions. Areas where there is no test suite to pass. The AI fills the verification gap with affirmation, and the user, finding no friction, takes the affirmation as confirmation.

There is a further compounding effect specific to high-volume users. Research into the exoskeleton effect found that AI strengthens productivity in the short-term but weakens the underlying skills when the tool is removed. The participants who performed best treated AI as a sparring partner rather than an answers machine. There is also the neuroscience of habituation: if the AI always agrees, the brain normalises that agreement and stops registering it as signal. The yes-man effect compounds over time not because the AI gets worse, but because the user gets better at ignoring the warning signs.

The only response that addresses the mechanism rather than the symptoms is friction that is designed, not prompted. Bloomberg recently reported that chess grandmasters are now deliberately playing suboptimal moves - not to lose, but to drag opponents out of AI-memorised sequences and into positions requiring genuine calculation. The friction is the strategy. The practitioners building the most durable advantage are not the ones using AI the least. They are the ones who have made challenge a structural feature of how they work, rather than something they remember to ask for when they are already uncertain.

The agreement incentive is structural, not incidental. Working around it requires something harder than a prompt. It requires a professional habit of being suspicious of the tool you rely on the most. The prompts are easy. The hardest part is doubting the room.

Interested?

If you would like to find out more about working effectively with AI, please do get in touch.

Contact us