Welcome to edition three.

Last week we looked at measurement gaps, shadow AI adoption, and why small businesses move faster than enterprises. This week: what happens when the budget flows but the system doesn't follow.

Leaders fund AI like infrastructure but manage it like a pilot. Budgets are up 33%. Productivity reports look good. But teams save time, then spend it fixing mistakes or drifting into low-value work. One person plus AI can match a two-person team until trust breaks and no one knows who checks what. Employees adopt faster than policy, so usage goes underground.

Budgets Rising, Value Stalling: What Kyndryl's Data Reveals

AI budgets surge while scale stalls. Kyndryl's 2025 Readiness Report tracking 3,700 executives shows investments up 33% year over year. Yet 62% remain stuck in pilots and only 29% say employees are ready, despite 90% confidence in tools.

Kyndryl surveyed executives across 21 countries. Fifty-four percent report positive ROI, up 12 points from 2024. But that average hides steep variance. "Pacesetters"—the top performers—are 32 points less likely to see their tech stack as a barrier and 20 points less likely to suffer cyber outages. Adoption intent runs hot: 68% invest heavily in at least one AI area. Workforce readiness lags at 29% while 87% expect jobs to transform within 12 months.

Why scale crawls: Budgets chase tools before capabilities. This triggers experiments without the scaffolding to scale. Teams ship proofs of concept, but without skill-building and workflow redesign, usage stays shallow. Leaders then declare progress because dashboards look busy.

Structural incentives lock the pattern in place. Organizations measure time saved and tools deployed. Those are easy to capture. They don't measure time redeployed or employee readiness. Those require named outcomes, calendars, and managerial accountability. Without defect rework monitoring, speed wins while rework quietly taxes margins.

The study doesn't track redeployment directly. But the 90% tooling confidence versus 29% readiness gap suggests reclaimed capacity drifts into low-value work while skills plateau.

What works: Set a two-gate ROI policy before scaling. First gate: measure time saved. Second gate: name where those hours go, schedule the work, and assign accountability. Use Kyndryl's benchmarks for self-calibration. If you're below 29% workforce readiness or above 62% experimentation for two quarters, reassess.

What's next: Track four directional ratios requiring validation in your context: redeployment rate (strong: 70%+; weak: below 40%), pilot-to-production conversion (strong: 30-40%; weak: below 20%), defect rework, and time saved per person. Baseline this quarter. Compare disciplined teams against time-saved-only teams over two quarters.

Source: TechRepublic

AI Speeds Drafting, Then Tool Sprawl Erases the Gain

AI boosts creativity until overload kills it. New research reveals why gains stall and what might prevent it.

MDPI Behavioral Sciences research (Hohai University, 309 employees) found AI introduction positively impacts creativity. The study reports high technology overload weakens both paths: AI's impact on autonomy falls and the feedback path drops under high-overload conditions.

Workplace observations show a usage pattern in AI-assisted writing: teams use AI for idea generation and proofreading more than end-to-end drafting. Throughput rises while deep engagement thins. That split explains why quantity improves but originality varies.

Why gains stall: Leaders deploy tools and monitoring without redesigning workload. Interruptions stack, interfaces multiply, and technology overload climbs. This muffles autonomy and feedback: the two pathways that drive creativity. Usage grows while originality plateaus.

The measurement system locks the pattern. Leaders track throughput because it's easy to capture and praise. They rarely track originality under constraint: that needs rubrics, deep-work minutes, and assistant-versus-substitute usage data. Compliance monitoring adds oversight noise. Technostress rises and psychological safety erodes.

Neither study tracked deep-work minutes directly. But the positive creativity coefficient alongside reports of thinner ideas suggests teams trade struggle for speed when tools substitute for drafting.

What works: Protect autonomy and feedback during training. Schedule no-notification deep-work blocks. Define acceptable assistant moves like summarizing. Restrict substitute moves like end-to-end drafting during ramp-up. This approach requires validation across different contexts.

What's next:Track four ratios—time-to-draft versus originality, overload index change, assistant-to-substitute usage, autonomy and feedback means. Baseline at week zero. Run A/B across cohorts. If overload management and assistant-mode discipline matter, dual-gate teams should show higher novelty within 8-12 weeks than throughput-only teams at equal or lower overload.

Sources: MDPI – Behavioral Sciences Harvard Data Science Review (MIT Press)

AI Oversight Fails at Identity: Design for Challenge

Leaders want human oversight. Interface design determines if they get it. Carnegie Mellon research by Zhaohui Jiang and Linda Argote shows explanation style shapes collaboration.

Transparent AI models helped less-skilled users improve by revealing decision rules they could learn from. High-ability users showed AI aversion with transparent models—penalizing AI mistakes more harshly than identical human errors.

Early workplace observations suggest interface framing matters. Assistant cues may encourage input while evaluator cues could dampen participation, though systematic measurement remains limited. Digital oversight may shift perceived risk versus in-person review.

Here's the problem: Teams deploy tools before designing decision processes. Without clear disclosure, rationale requirements, or override channels, people rubber-stamp or disengage. Error-catching stays low among experts while novices over-rely.

Leaders measure time saved, not override quality. Override quality needs rationale capture, risk tiering, and rework tracking—harder metrics that stay invisible. Disclosure becomes a compliance checkbox, not an accountability control.

Research doesn't link overrides to defects directly. But collaboration patterns suggest two failure modes: challenges get suppressed on risky calls, and useful recommendations get rejected. Both raise rework.

What might work: Match rationale depth to decision risk. For high-stakes calls, require step-by-step reasoning, a named override owner, and a brief "why proceed" note. Set 10-20% override rate as a health signal—below 5% suggests rubber-stamping. Default to assistant framing, reserving evaluator framing for audit contexts. These are directional approaches requiring validation in your context.

What's next: Track four ratios—override rate, re-opened decisions, rationale completeness on high-risk calls, and defect rate. Baseline in month one. Run an 8-12 week A/B test. If the framework works, teams should raise override quality and cut decision time without higher rework. Lock the playbook only if results replicate across contexts.

Sources: Carnegie Mellon University

Shadow AI Overtakes Governance: The Visibility Gap

Employees adopt AI faster than governance moves. Budgets and intent rise while visibility lags. Momentum turns into risk.

HKPC's 2025 workplace survey of roughly 800 firms found 88% of employees now use AI tools. Only 45% have officially recognized platforms, while 54% admit incomplete governance. Intent runs high—92% plan to introduce AI—but just 24% target full implementation within a year.

That gap slows scale. Real conversion is happening. Netskope tracked manufacturing's unmanaged genAI use falling from 83% to 51% as approved apps rose from 15% to 42%. Cycode finds 100% of companies now ship AI-generated code, yet 81% lack visibility and 52% have no formal frameworks. The mismatch explains why pilots hit policy friction instead of production.

Here's the problem: Tools arrive before guardrails. Employees solve immediate problems, so shadow use stays high and underreported. Without registries and safe alternatives, the shadow-to-approved share stays skewed. Leaders measure what's countable—policy blocks and signed acknowledgments.

Conversion, visibility, and violation mix require sanctioned alternatives, procurement speed, unified logs, and tagging. They add accountability, so they get deferred. Studies don't measure psychological safety directly. Yet the gap between near-universal use (88% employees; 100% AI code) and low visibility (81% lacking) suggests disclosure risk. People hide use when approval pathways lag.

What works: Governance-by-alternative—publish an approved AI catalog, fast-track requests, default to enterprise tools that replace popular shadow apps. Use conversion benchmarks: manufacturing's 42% approved share suggests a 40%+ target within 90 days; below 25% signals weak alternatives.

What's next: Track four ratios and baseline now—shadow-to-approved conversion, visibility rate, violation mix by category, and disclosure rate. If these disciplines drive safer adoption, teams meeting thresholds should reduce regulated/IP violation share and increase disclosure within one quarter. Validate quarterly. If conversion and visibility rise by 15 points in 90 days with flat or falling violations, scale the model.

Sources: Thailand Business News (Media OutReach) CXOtoday CIO.com Business Wire (Cycode)

One Human + AI Performs Like Two. Until Trust and Escalation Break

One human plus AI can match a two-person team. Then trust, oversight, and escalation bottleneck. Johns Hopkins and MIT Sloan research, reported by PYMNTS, measured real work.

AI-assisted workers produced 60% more output than non-AI peers while maintaining quality. They sent 23% fewer messages. Coordination costs fell. Harvard Business School researchers studied 776 P&G professionals in a field experiment. Individuals using AI matched two-person teams without AI.

AI-augmented teams showed greater creativity than non-AI teams. Carnegie Mellon work quantified the trust gap: people felt more vulnerable when AI evaluated them. Transparency helped lower-skill users; opacity favored experts.

Here's the problem: Leaders swap a peer for a co-pilot without redesigning decision rights. Ambiguity about who checks what suppresses use for cautious staff while encouraging risky over-reliance by speed-seekers. Explanation modes remain untuned to user ability.

The structural incentive reinforces it. Leaders track throughput because tasks per hour and fewer messages are easy to count. They ignore oversight and escalation health because those metrics create accountability and are harder to collect. Teams default to peer-to-peer stance instead of triaging by risk.

Studies don't track error spillover or redeployment of saved coordination time. Yet the gap between 60% output gains and reported vulnerability suggests oversight debt. People withhold edge cases or delay escalation. Speed today becomes rework later.

What might work: Define control modes before scaling workflows. Tool mode: AI proposes, humans execute. Co-pilot: AI assists a human driver. Partner: divide tasks by strength. Supervised autonomy: AI runs within guardrails, humans review exceptions. This is a directional framework requiring validation in your context.

What's next: Baseline week one, review week eight. With modes and triage, teams should lift throughput per person and hold defects flat by week eight versus "AI-everywhere" peers. Run A/B tests on explanation depth by user ability. Validate the approach before standardizing across teams.

Sources: PYMNTS TechXplore / Carnegie Mellon University

The operating system problem

The pattern sharpens: Budgets rise, pilots multiply, yet value stalls at scale. Leaders fund tools before redesigning workflows. Employees adopt faster than policy, so usage hides. Oversight tilts toward speed, missing the judgment calls that prevent rework. Creativity depends on autonomy and feedback—but tool sprawl and overload crush both.

These failures connect. Saved time drifts because no one assigns it to outcomes. Trust breaks when roles blur and escalation paths stay vague. Shadow adoption compounds when approved tools are slower than alternatives. Measurement tracks throughput, not whether originality survives or overrides carry rationales.

The gap isn't the model. It's the missing operating system: redeployed hours, protected deep work, clear decision rights, and approved paths that beat shadow convenience.

The trade-off is real. Fund pilots without redesign, and capacity evaporates into rework. Lock down hard, and discovery stalls. The middle path: Name where saved time goes. Start with assistant modes while skills form. Make sanctioned tools the easiest option. Track four things—redeployment, novelty under constraint, override quality, and shadow-to-approved conversion.

Until next time, Matthias


Artificial intelligence is reshaping how we work. But the real challenge isn't technical. It's human. AI isn't just a tool — it's a new team member. The Collaboration Brief curates weekly insights on how people work with AI and with each other in this new reality.

P.S. This newsletter practices what it preaches. AI agents handle research, fact-checking, and drafting. I curate sources, validate claims, and make final calls on quality. Human judgment at every stage.

P.S.S. Behind the scenes: an AI agent team that evolves every week. I tune prompts, split tasks, and refine quality checks to make output more reliable and repeatable. Research synthesis, fact validation, structural editing—each has a specialized agent.

Spot something that works or misses? Share it. You're helping build the system.

https://www.jtbd-to-ai.com/