Happy New Year 2026! A year of showing how human behaviour determines whether AI works.

Two studies. Two failures. Same cause.

A pharmaceutical company deploys an AI system to its Nordic sales teams. Some experts conduct 40% more client meetings. Others perform 20% worse than colleagues still using spreadsheets. The AI is identical. The implementation isn't.

Meanwhile, researchers put 228 people under time pressure while evaluating AI outputs. The participants think they're catching mistakes. They're not. Their ability to spot errors collapses by half. Their confidence doubles.

Both failures stem from the same problem: the space between human cognition and system design. Get that space wrong and superior technology produces inferior outcomes. Get it right and the same system generates a 60% performance swing.

Here's how that space breaks down.


How AI Implementation Succeeded and Failed at the Same Company

A pharmaceutical company installed an AI system to help its Nordic sales teams. The system was identical across teams. Yet some sales experts conducted 40% more client meetings while others performed 20% worse than colleagues still using the old database.

The difference? How the company implemented the system.

What Happened

Researchers studied 72 sales experts across four countries over five years. The company split teams into three groups. One continued with the old system. The other two received the new AI system with different work parameters: procedures, decision-making authority, training, and incentives.

The "tailored" group matched parameters to each person's cognitive style. Some people (adaptors) prefer structure and precision. Others (innovators) value flexibility. Adaptors received mandatory training and defined procedures. Innovators got on-demand training and flexible procedures.

The "untailored" group received the same AI with one-size-fits-all parameters management thought would work best.

The Results

Tailored implementation: 0.84 more daily client meetings than the control group and 450 additional daily unit sales.

Untailored implementation: 0.46 fewer meetings and 547 fewer sales. Worse than the old system.

Why It Happened

System login data tells the story. The untailored group initially used the system frequently, then usage dropped 40% below the tailored group.

Interviews revealed the problem. A Danish innovator called the system "the best work tool I could imagine" but felt "in prison" using it. Rigid procedures killed his drive. A Swedish adaptor praised the system's quality but found it "messy to work with" without clear guidelines.

The tailored group thrived. A Norwegian adaptor said structured guidelines made him "focus better on customers." A Swedish innovator appreciated training "when I needed it."

Work procedures mattered most. Pharmaceutical sales are unstructured. Success depends on reading personalities and adapting in real time. The AI handled large-scale analysis but could not access tacit information sales experts held. When procedures did not match cognitive styles, disruption outweighed AI's benefits.

What This Means

AI implementation can destroy value even with superior technology. The untailored system was technically better but produced worse outcomes.

Human-AI complementarity is necessary but insufficient. The AI provided valuable information, but this only improved performance when the interaction context supported human engagement.

People are not irrationally averse to algorithms. Sales experts rationally adjusted usage based on whether the context enabled effective work.

The Larger Point

AI systems encroach on tasks requiring human expertise. This makes implementation context critical. The same AI system, implemented differently, produced a 60% performance swing.

The lesson: match implementation to people. Technology alone does not determine outcomes. How people interact with technology does.

Source: Human-Centered Artificial Intelligence: A Field Experiment


Time Pressure Destroys Our Ability to Catch AI Mistakes

Hermanns and Teubner discovered something alarming: when people work with AI under time pressure, they lose their ability to spot errors.

The Experiment

They recruited 228 people and split them into two groups. The control group evaluated 36 AI-generated responses with unlimited time. The treatment group faced tight deadlines—between 7 and 28 seconds per task. Both groups saw the same mix: 75% correct AI responses, 25% wrong ones.

The question: Would time pressure simply make people worse at everything, or would it break their judgment in specific ways?

The Surprise

Traditional theory predicted disaster: time pressure plus complex tasks should compound each other's negative effects.

The data told a different story.

Time pressure reduced performance by 2.14 points. But it offset the negative effects of task complexity by 75%. Read that again. Under time pressure, task difficulty mattered less, not more.

The control group showed expected patterns: performance declined as tasks got harder. The time-pressured group performed consistently poorly regardless of difficulty. They'd stopped processing task complexity altogether.

The Real Problem

When researchers separated correct from incorrect AI responses, an asymmetry emerged. Time-pressured participants maintained stable ability to recognize correct outputs. But their capacity to identify incorrect ones collapsed.

Think about what this means. You're using AI under deadline pressure—writing a report, reviewing code, checking a diagnosis. The AI performs well most of the time. Under pressure, you notice. But when it makes a mistake, you miss it. Not because you're careless, but because time pressure fundamentally alters how you evaluate outputs.

Throughout the 36 tasks, time-pressured participants increasingly selected "fully trust" responses, climbing from 20% to 40%. The control group held steady at 50%. Nobody received feedback. Yet time-pressured participants grew more trusting, not more skeptical.

Why This Matters

The researchers tested whether increased trust explained the performance decline. It didn't. Time pressure affected performance directly, independent of stated trust levels. The cognitive mechanism operates below conscious choice.

You can't fix this by telling users to trust AI less.

Under time pressure, participants caught roughly half the errors they should have caught. Scale this to a hospital, a law firm, or a financial institution. An AI that's 95% accurate with users who can't identify the remaining 5% of errors, because they're rushed and trust the system, becomes dangerous.

Three Implications

Eleven percent of EU workers always work under time pressure. Another 25% face it frequently. This is modern knowledge work.

First: AI interfaces must assume users under pressure cannot evaluate outputs reliably. Systems need automatic verification when risk is high.

Second: AI works best for routine tasks where full automation is possible, not complex decisions where rushed human judgment fails.

Third: Organizations cannot rely on user experience to calibrate trust. Familiarity breeds overconfidence, not competence.

The Bottom Line

Hermanns and Teubner used actual ChatGPT responses, real time pressure, and performance-based payment. Their findings are unambiguous: time pressure doesn't just slow us down. It changes how we think about AI assistance.

We stop evaluating. We start trusting.

And the AI errors we miss are the ones that matter most.

Source: Under pressure: how time constraints, task complexity, and AI reliability shape human-AI interaction


What This Means

Different symptoms. Same disease: systems that ignore how humans actually process information.

This matters because most organizations measure the wrong things. They test model accuracy, track adoption rates, survey user satisfaction. Meanwhile, actual performance depends on factors invisible to these metrics.

The pharmaceutical study found a 60-point performance swing based purely on whether implementation matched cognitive style. The time-pressure study revealed users maintain ability to recognize correct AI outputs under deadlines but lose capacity to identify errors. You can't fix this by asking users how confident they feel.

Three implications:

Capability means nothing without implementation. The untailored pharmaceutical system had superior technology and produced inferior results.

User feedback misleads. People loved systems while their performance declined. Organizations cannot calibrate AI deployment based on sentiment.

Design systems around actual working conditions. Time pressure degrades human judgment. Implementation must account for how people actually work, not how we wish they worked. Ignoring real conditions creates failures neither humans nor AI would produce alone.

We've spent years asking whether AI will replace human workers. These studies answer a different question: AI fails when we ignore how humans think under real working conditions. The technology works. The implementation destroys value anyway.

Until next time, Matthias