Your Company Is Hemorrhaging $28,500 Per Employee and Calling It 'Process': The Brutal Truth About AI Computer Use in 2025
Manual data entry costs U.S. companies $28,500 per employee per year. Not a typo. Twenty-eight thousand five hundred dollars. Per person. Per year. And that's before you factor in the 56% of those employees who are burning out from the repetitive grind and quietly quitting or rage-quitting into a better job. Meanwhile, over 40% of workers are spending at least a quarter of their entire work week on tasks a decent AI computer use agent could handle before lunch. We've had automation technology for years. We've had RPA hype cycles. We've had digital transformation consultants billing $400 an hour to tell you to buy UiPath licenses. And yet here we are, in 2025, watching humans manually copy data between spreadsheets like it's 2009. Something went badly wrong. And the industry is only now being honest about what it was.
RPA Promised Everything and Delivered a Maintenance Nightmare
Let's be blunt about RPA. Robotic Process Automation was sold as the future of work automation. The pitch was clean: point a bot at a repetitive task, let it run, go home early. The reality? RPA bots are brittle. They break the moment a UI element shifts by three pixels. They require dedicated developer teams to maintain. They can't handle exceptions, edge cases, or anything that looks slightly different from what they were trained on. Every enterprise that went deep on UiPath or Automation Anywhere has a war story. The bot that silently failed for six weeks before anyone noticed. The update that broke 40 automations overnight. The IT team that spent more time maintaining the bots than the bots saved in human hours. A 2022 analysis from Blueprint Systems found that the number one reason RPA projects fail is poor process selection combined with underestimating the ongoing maintenance burden. That's a polite way of saying companies bought a fragile, expensive tool and then discovered it needed a full-time babysitter. RPA was never really automation. It was scripting with better marketing. And the market is finally catching on.
The New Players Showed Up. Some of Them Also Kind of Fumbled.
- ●OpenAI's Operator launched in January 2025 as a 'research preview' for Pro users only in the U.S. A journalist asked it to order groceries. It couldn't. Reliably.
- ●Anthropic's Claude Computer Use is genuinely impressive in demos and genuinely frustrating in production. One independent benchmark put Claude Sonnet 4.5 at 61.4% on OSWorld. That's better than nothing. It's not good enough to trust with your actual workflows.
- ●Google's Project Mariner is a computer-use agent that navigates browsers. It's in early access. The reviews are 'interesting but limited.' That's tech-journalist speak for 'not ready.'
- ●All three of the big players launched their computer use products as 'research previews.' That phrase is doing a lot of heavy lifting. It means: cool demo, don't run your business on it.
- ●The OSWorld benchmark, which tests AI agents on 369 real desktop tasks across file management, web browsing, and multi-app workflows, has become the honest scorecard nobody can fake. And the scores from the big labs are, charitably, a work in progress.
62% of employees spend the majority of their time on repetitive tasks. Over half are burning out because of it. And the 'solutions' the enterprise software industry sold them are either too brittle to trust or too early to ship. The productivity crisis isn't a technology problem anymore. It's a courage problem. Companies know what needs to change. They're just scared to commit.
Why 'Research Preview' Is the Most Expensive Phrase in Tech Right Now
Here's what bothers me about how the big AI labs have handled computer use. They announced it. They demoed it. They charged for it. And then they slapped 'research preview' on the label so nobody could hold them accountable when it failed in production. One tech writer at Understanding AI put it plainly after testing both Operator and Claude's computer use agent: the best model was OpenAI's Operator, and even that wasn't saying much. He asked it to correct mistakes it made. It made new ones. He described computer-use agents broadly as feeling like a dead end. That take is wrong, by the way, but I understand why he wrote it. When the best-resourced AI labs in the world ship half-baked tools and call it innovation, it poisons the well for everyone. It makes CTOs skeptical. It makes automation budgets shrink. It makes people go back to hiring data entry clerks because at least the clerk shows up and doesn't hallucinate a form submission. The problem isn't that AI computer use doesn't work. The problem is that most of what's been shipped so far doesn't work well enough to trust with anything important. There's a massive gap between 'impressive demo' and 'I'll let this touch my CRM.' Closing that gap is the actual hard problem. And most of the big players are still in the demo phase.
What Good AI Computer Use Actually Looks Like in 2025
The OSWorld benchmark doesn't lie. It tests real desktop tasks: opening applications, navigating browsers, managing files, running multi-step workflows across multiple apps. It's the closest thing the industry has to a real-world stress test for computer-using AI. Most agents are scoring in the 30 to 60 percent range. That means they fail between 40 and 70 percent of the time on tasks a competent human handles without thinking. For anything business-critical, that failure rate is disqualifying. The agents worth paying attention to in 2025 are the ones that treat computer use as a first-class capability, not a bolt-on feature. The difference shows up fast. A first-class computer use agent can control a real desktop, navigate a real browser, run terminal commands, and handle the messy reality of enterprise software without breaking every time a modal window appears unexpectedly. A bolt-on feature can do a scripted demo at a conference. The gap between those two things is where most of the industry's credibility problems live right now. Companies evaluating AI desktop automation in 2025 need to stop asking 'can it do this task in a demo' and start asking 'what's its real-world success rate on ambiguous, multi-step work.' That question separates the tools worth deploying from the ones worth ignoring.
Why Coasty Exists
I've been critical of a lot of tools in this post, so let me be specific about what I think actually works. Coasty sits at 82% on OSWorld. That's not a marketing claim, it's a benchmark score, and it's higher than every competitor shipping today. The gap between 82% and 61% isn't incremental, it's the difference between a tool you can trust with real work and one you babysit. Coasty controls actual desktops, real browsers, and live terminals. Not API wrappers. Not scripted automations that break when a button moves. It works the way a human works, by seeing the screen and deciding what to do next, which means it handles the unexpected instead of crashing on it. The agent swarm capability is where things get genuinely interesting for teams doing parallel work. Spin up multiple agents running simultaneously, each handling a different workflow, and you stop thinking about automation as 'one bot doing one thing' and start thinking about it as a parallel workforce. There's a free tier, BYOK support if you want to bring your own API keys, and a desktop app plus cloud VMs so you can run it however your stack demands. I'm not saying Coasty is perfect. I'm saying it's the most serious answer to the question 'what computer use agent should I actually run my business on in 2025.' The benchmark backs that up. The architecture backs that up. And unlike the research previews from the big labs, it's built to be deployed, not demoed.
Here's my take, and I'll stand behind it: 2025 is the year where the AI desktop automation hype either converts into real productivity gains or collapses into another cautionary tale about enterprise software promises. RPA had its chance and blew it. The big labs launched their computer use features too early and too cautiously. The companies that win the next three years are the ones that stop waiting for the perfect tool and start deploying the best tool available right now. $28,500 per employee in manual task waste is not a number you wait out. That's not a problem you solve with another pilot program or another vendor evaluation cycle. Stop treating automation like a future investment and start treating it like the emergency it is. If you want to see what a computer use agent looks like when it's actually built to work, go to coasty.ai. The benchmark is 82%. The competitors aren't close. That's the whole argument.