Key takeaways
- Adoption is near-universal — 72% of organisations use generative AI per McKinsey’s State of AI 2025 — but only about 39% report any EBIT impact and roughly 5% see significant value.
- The best-evidenced wins are narrow, countable tasks: customer support (+15% resolution rate, per a Quarterly Journal of Economics study), software and IT work, and document-heavy back-office tasks.
- Generative AI helps novices most and can slow experts: a METR trial found experienced developers were 19% slower, while believing they were faster — so measure outcomes, do not survey opinions.
- Value concentrates where workflows are redesigned (McKinsey’s biggest driver) and where tools are bought from proven vendors (about 67% success) rather than built in-house (around one-third).
- Agentic AI carries real risk: Gartner expects over 40% of agentic projects to be cancelled by end of 2027 — scope agents tightly and match governance to their autonomy.
By 2026 the question is no longer whether to use generative AI. According to McKinsey’s State of AI 2025 survey, 88% of organisations now regularly use AI in at least one business function and 72% use generative AI specifically, up from 33% the year before. Adoption is close to universal. The harder, more honest question is where it is actually paying off, because the same body of research shows that most organisations cannot point to a financial result.
That gap between use and value is the real story of this year, and it is worth being precise about. The same McKinsey survey found only 39% of organisations report any enterprise-level EBIT impact from AI, and around 5.5% — roughly 109 of nearly 1,933 respondents — attribute more than 5% of EBIT and significant value to it. An MIT NANDA report, The GenAI Divide: State of AI in Business 2025, put it more bluntly: drawing on 150 leader interviews, 350 employee survey responses and an analysis of 300 public deployments, it found about 95% of enterprise generative-AI pilots fail to deliver measurable profit-and-loss impact, against an estimated 30 to 40 billion US dollars of enterprise spend.
Read together, those numbers are not an argument against generative AI. They are an argument for being deliberate about where you apply it. This piece sets out where the evidence for payoff is strongest, where it is weak or carries real risk, and how to choose and measure use-cases so you end up in the minority that sees a return.
Where the evidence for payoff is strongest
The use-cases with the best independent evidence share a shape: a narrow, high-frequency task with a measurable outcome, where a person stays accountable for the result. Three areas stand out.
Customer support has the strongest field evidence of any. A peer-reviewed study, Generative AI at Work, published in the Quarterly Journal of Economics in 2025, followed a conversational AI assistant rolled out to 5,172 customer-support agents and found it raised productivity — issues resolved per hour — by 15% on average. Notably, the gains were uneven: less-experienced and lower-skilled agents improved most in both speed and quality, while the most experienced agents saw only small speed gains and a slight dip in quality. The tool compressed the skill gap rather than lifting everyone equally, which is a useful thing to know when you decide who benefits most from it.
Software engineering and IT is the second. GitHub’s own controlled experiment found developers wrote an HTTP server in JavaScript 55.8% faster using its Copilot tool than those without it. That is a vendor-run study, so treat it as indicative rather than settled — but it points the same direction as McKinsey’s survey, which reports cost reductions from individual AI use-cases most often in software engineering, IT and manufacturing, frequently in the 10 to 20% range.
Document and knowledge work — summarising long records, drafting first versions of routine text, and pulling specific facts out of material nobody has the hours to read in full — is the third. It sits in the same family as the support case: high-volume, repetitive, and easy to check. Tellingly, MIT NANDA found the biggest return on investment sat in back-office automation, even though more than half of budgets went to sales and marketing tools. The money and the payoff were pointing in different directions.
Where it underdelivers or carries risk
The same research that supports those wins also marks out the territory where caution is warranted — and it is more specific than the usual warnings about accuracy.
The first surprise is that generative AI can slow down your most capable people. A 2025 randomised controlled trial by METR took 16 experienced open-source developers through 246 tasks on large, mature codebases they knew well. Allowing early-2025 AI tools made them 19% slower, not faster — even though they had predicted a 24% speed-up beforehand and still believed afterwards that AI had sped them up by about 20%. METR is careful to say this does not show AI fails to help most developers, and may not extend beyond experts working on code they know intimately. But the perception gap is the lesson worth keeping: people are not reliable judges of their own AI-assisted productivity, which is exactly why you measure rather than survey.
The second is the move towards autonomous ‘agents’, where the risk is most concentrated. Gartner predicts that over 40% of agentic AI projects will be cancelled by the end of 2027, citing escalating costs, unclear business value and inadequate risk controls — and warns of ‘agent washing’, the rebranding of older chatbots and automation as agents, estimating only around 130 of the thousands of agentic-AI vendors are genuine. In a separate note, Gartner predicts that by 2027, 40% of enterprises will demote or decommission autonomous agents because governance gaps only surfaced after production incidents, and cautions that applying one uniform governance standard across every agent, regardless of its autonomy and scope, will itself cause failures.
None of this means agents are a dead end. Gartner also forecasts real, measured growth: at least 15% of day-to-day work decisions made autonomously through agentic AI by 2028, up from zero in 2024, and a third of enterprise software applications including agentic AI by the same year. The point is timing and scope — treat agents as a frontier to scope tightly, not as the default answer to an operational problem you could solve more simply.
What separates the firms that see value
Across McKinsey and MIT, the same three factors come up — and none of them is the model itself.
Redesign the workflow, do not bolt AI onto it. McKinsey identifies workflow redesign as the single biggest driver of whether an organisation sees EBIT impact from generative AI. High performers treat the tool as a reason to rethink how the work flows, not a feature to staple to an unchanged process. MIT NANDA reached a complementary conclusion: it attributed pilot failures not to model quality but to a ‘learning gap’ — the inability to fold AI into workflows, structures and culture.
Buy proven tools before you build. MIT NANDA found that buying from specialised vendors succeeded about 67% of the time, against roughly a third as often for internally built systems. For most operational use-cases, a bought tool with a track record is the lower-risk starting point; building in-house is a choice to make deliberately, not by default.
Point it at operations, not just the storefront. The same report’s finding that the biggest ROI sat in back-office automation, while budgets flowed to sales and marketing, is a direct prompt: look first at the repetitive internal work — support, document handling, reconciliation, code — where outcomes are countable.
A short checklist for choosing and measuring
If you take one thing from the evidence, let it be this: decide how you will measure the result before you build, and measure profit-and-loss impact rather than adoption or activity. Use-counts and licences tell you nothing about value — which is precisely how so many firms ended up with high adoption and no return.
- Pick a narrow, high-frequency task with a measurable outcome — resolution rate, cycle time, cost-to-serve, lead time on a code change — not a broad ambition.
- Name the metric and the baseline first, then run the new way alongside the old one long enough to compare like with like through a full cycle of real, messy conditions.
- Measure outcomes, not opinions. The METR finding that experts misjudged their own speed-up is the reason to instrument the work rather than ask people how it felt.
- Favour a proven bought tool over an internal build for the first version, given the roughly 67% versus one-third success rates MIT NANDA reported.
- Redesign the surrounding workflow, since McKinsey identifies that as the biggest single driver of financial impact.
- Scope any autonomous agent to its actual autonomy and risk, with governance to match — and be ready to stop if the value is not there.
The bottom line
The 2026 picture is consistent across the most credible sources. Generative AI is genuinely paying off in a handful of well-shaped places — customer support, software and IT work, and the document-heavy back office — where the task is repetitive, the outcome is countable, and a person remains accountable. It underdelivers when it is bolted onto an unchanged process, judged by activity rather than profit, or pushed into autonomous territory faster than the governance can follow.
The firms in the minority that see a return are not the ones with the best model. They are the ones that chose a narrow problem, redesigned the work around it, bought before they built, and measured the result honestly. That is how we think about it on the projects we take on, and it is the same discipline whether the tool is generative AI or anything else: start from the work, and let the numbers, not the demo, decide whether it stays.
Sources
- The state of AI in 2025: Agents, innovation, and transformation | McKinsey (QuantumBlack)
- Generative AI at Work | The Quarterly Journal of Economics (Oxford Academic)
- The Impact of AI on Developer Productivity: Evidence from GitHub Copilot (arXiv 2302.06590)
- Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity | METR
- MIT report: 95% of generative AI pilots at companies are failing | Fortune
- Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027 | Gartner Newsroom
- Gartner Says Applying Uniform Governance Across AI Agents Will Lead to Enterprise AI Agent Failure | Gartner Newsroom