Comparison

Anthropic Computer Use Is Losing the Race It Started. Here's Who's Actually Winning.

Daniel Kim||7 min
Ctrl+S

Manual data entry alone costs U.S. companies $28,500 per employee every single year. Not from hackers. Not from bad strategy. From people manually copying data between screens, clicking through the same workflows, and doing work that a computer use agent could handle in seconds. And yet, here we are in 2026, and most companies are still either doing it by hand or betting on tools that can't even clear 40% on the industry's most important benchmark. Anthropic gets credit for making 'computer use' a real phrase people say at work. That's fair. But credit for coining a category and credit for winning it are two very different things, and right now, Anthropic isn't winning.

Let's Talk About the Benchmark Everyone Is Quietly Ignoring

OSWorld is the standard benchmark for AI computer use. It tests whether an agent can actually operate a real desktop, navigate real applications, and complete real tasks without a human holding its hand. It's the closest thing the industry has to an honest exam. When OpenAI launched its Computer-Using Agent in January 2025, they threw a party over a 38.1% score on OSWorld. Thirty-eight percent. They called it 'state-of-the-art.' The press covered it like a moon landing. Meanwhile, Coasty is sitting at 82% on the same benchmark. That's not a slight edge. That's a different sport entirely. Anthropic's Claude-based computer use, for all the hype it generated when it launched, has never come close to that number either. The AI research community knows this. The benchmark is public. The scores are public. But the mainstream tech press keeps writing 'Anthropic computer use vs OpenAI Operator' comparison pieces as if those are the only two players on the field, because they're the two with the biggest PR budgets.

What Anthropic Computer Use Actually Gets Wrong

  • Reliability is the core problem. Even Anthropic's own research community admits current computer use agents are 'fairly unreliable and slow' in real-world deployments, and Claude's computer use is no exception when you push it outside demo conditions.
  • It's a model capability, not a product. Anthropic gives you the Claude API with computer use features. You still have to build the infrastructure around it, manage the desktop environment, handle failures, and orchestrate everything yourself. That's not an agent, that's a building block.
  • The benchmark gap is brutal. OpenAI's CUA launched at 38.1% on OSWorld. Anthropic's Claude-based computer use scores haven't broken away from that tier in a meaningful way. Both are being lapped by purpose-built computer use agents.
  • Usage limits and rate limits bite hard. Reddit threads about Claude's 'unreasonable message limitations' have thousands of upvotes. When you're trying to run autonomous workflows, hitting a rate limit mid-task doesn't just slow you down, it breaks the whole chain.
  • No native parallel execution. Real automation at scale means running multiple tasks simultaneously. Anthropic's computer use offering has no built-in agent swarm capability. You're running one thing at a time, which is barely better than hiring a person.

Over half of employees (56%) report burnout from repetitive manual tasks. They're not burned out because the work is hard. They're burned out because they know a computer use agent could do it, and nobody has given them one that actually works.

OpenAI Operator Isn't the Answer Either

When Operator launched, the early reviews were... rough. Real users found it impressive in demos and frustrating in production. It's browser-focused, which sounds fine until you realize that most enterprise workflows live in desktop applications, internal tools, and terminals that a browser-only agent can't touch. The AI2 Incubator's state-of-agents report noted that Operator's limitations 'drastically limit real-world usage,' and European users complained loudly about availability gaps. OpenAI then quietly evolved Operator into ChatGPT Agent, which is better, but still built on the same CUA architecture that scored 38.1% on OSWorld back in January 2025. Iterating on a weak foundation is still a weak foundation. The honest truth is that both Anthropic and OpenAI built their computer use products as extensions of their chat products. That's the wrong architecture. A computer use agent isn't a chatbot that can also click things. It's a fundamentally different system, and building it as an afterthought shows.

RPA Is Dead. Someone Should Tell the Enterprise Teams Still Using It.

Before Anthropic computer use was a thing, the automation world was dominated by RPA tools like UiPath. These are brittle, script-based robots that break the moment a UI changes by two pixels. They require specialized developers to build and maintain. They cost a fortune to deploy and a second fortune to keep running. Gartner and every analyst firm on earth has been warning about RPA failure rates for years. The average RPA project runs over budget, over time, and underdelivers. And yet companies keep buying UiPath licenses because it's what their IT department knows. This is the same logic that kept people on fax machines in 2015. AI-powered computer use agents don't need brittle scripts. They understand what they're looking at. They adapt. They handle edge cases that would crash an RPA bot instantly. The entire RPA market is a $20 billion industry built on a problem that modern computer-using AI solves better, faster, and cheaper. That's not a hot take. That's just arithmetic.

Why Coasty Exists

I'm not going to pretend I don't have a dog in this fight. I work at Coasty. But I work at Coasty because I looked at the benchmark scores, tested the alternatives, and the conclusion was obvious. Coasty sits at 82% on OSWorld. That's not a marketing claim, it's a public benchmark score you can verify yourself. No other computer use agent is close. The reason the score is that high is because Coasty was built from day one as a computer use agent, not retrofitted onto a chat model. It controls real desktops, real browsers, and real terminals. Not API wrappers pretending to be an agent. Not a browser extension with delusions of grandeur. An actual agent that sees your screen, understands what it's looking at, and executes. The desktop app means it works on your local machine. The cloud VM option means it can run headlessly at scale. The agent swarm capability means you can run parallel workflows that compress hours of work into minutes. And there's a free tier, so you don't have to write a procurement memo to try it. BYOK is supported if you want to bring your own API keys. The people who built this cared about one thing: making a computer use agent that actually works in the real world, not just in a demo video.

Here's where I land on this. Anthropic deserves respect for making computer use a real concept. They put it on the map. But respect for the past doesn't mean you should bet your company's automation strategy on a tool that scores in the 30s and 40s on the benchmark that matters most. OpenAI Operator is slicker marketing on a similar problem. RPA is a legacy tax you should stop paying. The gap between 38% and 82% on OSWorld isn't a gap in numbers. It's a gap in how much your team can actually automate, how many hours they get back, and how much of that $28,500 per-employee annual drain you actually stop. If you're still debating 'Anthropic computer use vs OpenAI Operator,' you're debating which of two second-place finishers to root for. Go to coasty.ai. Try the thing that's actually winning.

Want to see this in action?

View Case Studies
Try Coasty Free