Comparison

Anthropic Computer Use Is Losing the Race It Started. Here's the Proof.

Priya Patel||7 min
+Tab

Manual data entry costs U.S. companies $28,500 per employee every single year. Not in lost potential. In actual, measurable, auditable waste. And yet here we are in 2025, still debating which AI agent is good enough to fix it, while most teams are stuck using tools that were impressive at launch and have barely moved since. Anthropic computer use was a genuine breakthrough when it dropped in late 2024. Claude could look at your screen, click buttons, fill forms, and navigate apps like a person. The demos were wild. The internet lost its mind. And then the real-world results started coming in, and the story got a lot more complicated.

What Anthropic Computer Use Actually Got Right

Let's be fair first, because the tech press rarely is. Anthropic's computer use API was the first serious, production-accessible implementation of a vision-based computer control agent from a major lab. Before that, you had brittle RPA scripts that broke every time a UI changed, and browser automation tools that couldn't handle anything outside a web page. Claude could see a screenshot, reason about what it showed, and take action. That was genuinely new. It worked well enough on simple, structured tasks: filling out forms, copying data between apps, navigating menus in predictable software. For teams with zero automation budget and a low tolerance for engineering complexity, it was a fast way to get something running. The API was clean. The documentation was decent. Anthropic's safety focus meant the agent was relatively conservative, which annoyed power users but made enterprise buyers feel better. That's the honest positive case. Now let's talk about why 'good enough at launch' isn't good enough anymore.

The Problems Nobody Talks About Loudly Enough

The rate limit situation with Anthropic's products has been a running joke on Reddit for over a year. There are multiple active megathreads, with hundreds of comments, from paying users hitting walls mid-task. One user described it perfectly: 'Usage limits plus Claude taking the liberty to try one data-intensive technique after another when earlier attempts have failed, and still getting cut off.' That's not a minor UX annoyance. That's a broken agentic workflow. When your computer use agent stops mid-task because of a token quota, you don't just lose time. You potentially leave your system in a half-finished state. A half-submitted form. A partially moved file. A process that now needs human review to untangle. Beyond limits, there's the security question that Anthropic itself has been surprisingly candid about. HiddenLayer published research in October 2024 showing that Claude's computer use implementation was vulnerable to indirect prompt injection attacks, where malicious content on a webpage or in a document could hijack the agent's actions. Anthropic's own system card for Claude Opus 4 included a dedicated section on 'computer use prompt injection evaluation.' They're working on it. But 'working on it' and 'solved' are very different things when you're running an agent with access to your desktop.

OpenAI Operator: The Competitor That Launched With a Shrug

OpenAI shipped Operator in January 2025 with a lot of fanfare and a 38.1% success rate on OSWorld. Thirty-eight percent. On the benchmark that's supposed to be the standard measure of real-world computer use capability. To put that in context, that means Operator failed on roughly six out of ten tasks in a controlled evaluation environment. Not in messy production. In a benchmark. MIT Technology Review described it as capable of 'simple online tasks in a browser, such as booking concert tickets or filling an online grocery order.' That's the ceiling they were celebrating. Operator is also locked to a browser environment. It can't touch your desktop apps, your terminal, your local files, or anything outside a Chromium window. For a huge portion of real enterprise workflows, that's a non-starter. If your automation need involves a legacy Windows app, a desktop CRM, a locally installed tool, or anything that isn't a website, Operator simply can't help you. It's a web agent wearing an AI agent costume.

OpenAI Operator launched with a 38.1% score on OSWorld. Coasty sits at 82%. That's not a gap. That's a different category of product entirely.

And Then There's RPA. Still. In 2025.

Some companies are still reaching for UiPath or Blue Prism when they need to automate computer tasks. I understand the instinct. These tools have been around for years, they have enterprise sales teams, and your IT department probably already has a license. But the maintenance cost is brutal. RPA bots are essentially recorded scripts. Every time a UI changes, a button moves, a form gets redesigned, or a vendor updates their web portal, your bot breaks. And then a human has to fix it. UiPath's own community forums are full of questions about failure rates and maintenance overhead. A 2025 report from Auxis specifically called out proactive RPA maintenance as 'the secret to sustainable success,' which is a polite way of saying these things break constantly and you need a dedicated person to babysit them. Companies are abandoning legacy RPA for AI-native solutions at an accelerating rate. The firms still clinging to it are mostly doing so because switching feels hard, not because the tool is good. Meanwhile, office workers still spend over 50% of their time on repetitive tasks according to ProcessMaker's 2024 research. The automation tools that were supposed to fix this clearly haven't.

Why Coasty Exists

I don't write about tools I don't believe in. Coasty sits at 82% on OSWorld, the hardest and most representative benchmark for AI computer use. That's not a marketing number from an internal test. OSWorld is an independent academic benchmark that came out of NeurIPS 2024, and it tests agents on real tasks across real operating system environments. Anthropic's Claude Sonnet 4.5 scores 61.4% on it. OpenAI's Operator launched at 38.1%. Coasty is at 82%. The gap is real and it's not small. What makes Coasty different isn't just the score. It controls actual desktops, browsers, and terminals. Not just web pages. It works with a desktop app, cloud VMs, and agent swarms that can run tasks in parallel, which means you can scale automation horizontally instead of waiting for one agent to finish before starting the next. There's a free tier if you want to test it without a procurement conversation. BYOK is supported if you want to bring your own API keys and control costs. The security model is built for production use, not demos. And critically, it doesn't have the rate limit problem that has made Anthropic's computer use frustrating for anyone running real workflows. When you're automating tasks that cost your team $28,500 a year per person in wasted time, you need an agent that actually finishes what it starts.

Here's my actual take: Anthropic computer use was important because it proved the concept. It showed the industry that vision-based desktop agents could work. That contribution is real. But 'proved the concept' is not the same as 'best tool for the job in 2025.' OpenAI Operator is a browser toy. Legacy RPA is a maintenance nightmare held together by scripts and hope. And Anthropic's own users are posting megathreads about rate limits and half-finished tasks. The benchmark data is not ambiguous. If you're serious about computer use automation, you should be using the tool that scores 82% on the hardest test in the field, not the one that got there first. Go try Coasty at coasty.ai. There's a free tier. Run your actual workflow. Then tell me I'm wrong.

Want to see this in action?

View Case Studies
Try Coasty Free