Industry

95% of Business AI Pilots Are Failing. Here's Why a Real Computer Use Agent Fixes That.

Sophia Martinez||7 min
+D

MIT published a report in August 2025 that should have ended careers. 95% of generative AI pilots at companies are failing to turn a profit. Not underperforming. Not missing targets by a little. Failing. And yet, somehow, the same executives who greenlit those disasters are still out here signing new contracts with the same category of tools that burned them the first time. Meanwhile, the average employee wastes a full quarter of their work week on manual, repetitive tasks, according to Smartsheet's research. A quarter. That's 10 hours a week per person. If you're paying someone $80,000 a year and they're spending 25% of their time copy-pasting data between systems, you're lighting $20,000 on fire annually. Per employee. This is not a productivity problem. It's a tool problem. And most of the AI industry is actively making it worse.

The $644 Billion Lie the AI Industry Sold You

Let's talk about what's actually happening here, because the numbers are genuinely insane. One analysis pegged AI's economic destruction in 2025 at $644 billion in wasted investment. A separate report found that companies abandoned most AI initiatives last year, up from 17% abandonment in 2024. The pattern is consistent and damning: companies buy AI tools, the tools don't integrate with real workflows, nothing meaningful gets automated, and the budget gets written off as 'experimentation.' Here's the dirty secret nobody in enterprise software wants to say out loud. Most AI tools sold to businesses are glorified chatbots with a dashboard slapped on top. They can summarize a document. They can draft an email. What they cannot do is actually sit down at a computer and do the work. They can't log into your legacy CRM, pull a report, cross-reference it against a spreadsheet, and file the result somewhere useful. They talk about work. They don't do work. That distinction is everything, and the industry has spent two years blurring it on purpose.

Why Anthropic Computer Use and OpenAI Operator Are Still Not Enough

To be fair, some players have at least tried to build real computer use capabilities. Anthropic's Computer Use feature and OpenAI's Operator both attempt to give AI models actual control over a desktop or browser. That's the right instinct. The execution, though, is still catching up to the hype. Anthropic's own Claude 4.5 Sonnet scores 61.4% on OSWorld, the gold-standard benchmark for real-world computer tasks across file management, web browsing, and multi-app workflows. OpenAI's Operator has been publicly described as 'a big improvement but still not very useful' by independent reviewers who tested it on basic tasks like ordering groceries. One reviewer gave both Anthropic's computer use agent and OpenAI's Operator the same grocery task and neither completed it reliably. These are research previews, not production tools. They're impressive demos. They're not what you deploy to automate your accounts payable process or your customer onboarding flow. The gap between 'cool demo' and 'runs my business' is exactly where most enterprise AI money disappears.

95% of generative AI pilots at companies are failing to turn a profit, according to MIT's NANDA initiative. Not underperforming. Failing. And the #1 reason is that these tools avoid the friction of real workflows instead of solving it.

RPA Was Supposed to Fix This. Remember How That Went?

  • UiPath, the RPA giant, faced a class action securities fraud lawsuit in 2024 while its community forums filled up with posts about bots breaking every time a browser extension updated.
  • Traditional RPA is brittle by design. It automates a fixed sequence of clicks. Change one pixel in the UI, update one field label, and the whole bot falls over.
  • McKinsey found that almost all companies invest in AI but just 1% believe they've reached actual maturity. One percent. After years of RPA promises.
  • The average RPA implementation requires dedicated developer maintenance just to keep existing automations from breaking, which quietly eats the ROI you thought you were getting.
  • Shadow AI is now running at over 90% of companies surveyed, meaning employees are using unauthorized AI tools to get real work done because the official sanctioned tools don't work.
  • The core problem with RPA was always the same as the core problem with most AI tools today: they automate around the computer instead of using it the way a human would.

What 'Computer Use' Actually Means and Why It Changes Everything

A real computer use agent doesn't need an API. It doesn't need a pre-built integration. It doesn't need your vendor to have a partnership with Salesforce. It looks at a screen, understands what it sees, and does what needs to be done. It clicks buttons, fills forms, navigates menus, handles pop-ups, opens terminals, runs scripts, and moves between applications the same way a human operator would. That's the only kind of automation that actually survives contact with real business software, because real business software is messy, inconsistent, and full of legacy interfaces that will never get a modern API. The OSWorld benchmark exists specifically to measure this capability, and the scores tell you exactly who has built something real versus who is still selling you a chatbot in a trench coat. Tasks on OSWorld include real desktop work across multiple apps, file systems, and browsers. It's the closest thing we have to 'can this AI actually do a job.' Most agents score in the 30-50% range. A handful push past 60%. The gap between the top and everyone else is not incremental. It's the difference between a tool you can trust and a tool you babysit.

Why Coasty Exists

I've looked at a lot of computer use agents. Most of them are interesting. Coasty is the one that's actually ready to work. It scores 82% on OSWorld, which is not a rounding error above the competition. It's a different category of performance. Claude 4.5 Sonnet, which is genuinely impressive, sits at 61.4%. The gap between 61% and 82% in real-world computer task completion is the difference between an agent that finishes most things and one that finishes nearly everything. Coasty controls real desktops, real browsers, and real terminals. It's not making API calls and pretending. It's doing the actual computer work. You can run it as a desktop app, spin up cloud VMs, or deploy agent swarms to run tasks in parallel when you need scale. There's a free tier if you want to actually test it before committing, and BYOK support if you want to bring your own model keys. The reason I keep coming back to Coasty when people ask me what actually works for business automation is simple: it passes the 'give it a real task and walk away' test. Most tools don't. This one does. Check it out at coasty.ai.

Here's my honest take after watching this space for years. The companies that are going to win the next decade are not the ones who spent the most on AI. They're the ones who stopped buying AI theater and started deploying AI that actually operates software. The MIT number should be a wake-up call. 95% failure is not a technology problem. It's a category problem. Chatbots and summarizers were never going to automate your business. A computer use agent that can see a screen and get things done, that's a different story. Stop paying people to do work that a machine can handle. Stop buying tools that talk about automation without delivering it. And stop pretending that a 61% benchmark score is good enough when 82% exists. The bar is set. The tools are real. The only question is whether you're going to keep burning budget on the 95% or join the 5% that actually figured it out. Start at coasty.ai.

Want to see this in action?

View Case Studies
Try Coasty Free