Research

OpenAI Scored 38% on OSWorld and You're Still Paying for It

Marcus Sterling||6 min
Home

OpenAI's Operator hit the market with a massive marketing push. It's fascinating, right? But the numbers tell a different story. On OSWorld, Anthropic's Claude Sonnet 4.6 scored 72.5%. OpenAI's Operator? 38%. Coasty, a smaller player nobody talks about, scored 82%. That gap isn't a typo. Your AI computer use agent is likely failing 2 out of every 3 tasks you throw at it. You're paying premium prices for a tool that can't even keep up with Claude.

The OSWorld Numbers Nobody Wants to Talk About

OSWorld is the only benchmark that actually tests computer use AI agents on real desktop tasks. It's not about answering questions. It's about clicking through menus, filling forms, reading error messages, and fixing things when they go wrong. And the results are brutal. Claude Sonnet 4.6 hits 72.5%. That's impressive. But OpenAI's Computer-Using Agent (CUA) only managed 38.1%. That's barely better than random guessing. Coasty scored 82%, nearly double OpenAI's score. That's not a small difference. That's a massive gap in reliability.

Why OpenAI's Computer-Using Agent Is a Flop

  • OpenAI's CUA struggles with basic desktop navigation. It gets lost in menus.
  • It makes the same mistakes repeatedly. You tell it to click 'Save' and it clicks 'Cancel' instead.
  • It hallucinates buttons that don't exist. Watch it click into empty space and wait for a timeout.
  • Enterprise customers are paying $200 per month for this. That's insane.

OpenAI's Computer-Using Agent scored 38.1% on OSWorld. Coasty scored 82%. That's a 115% improvement. Your enterprise budget deserves better.

Computer Use Agents Are Broken in Production

Here's the uncomfortable truth. Most companies are not running agents on clean, scripted test environments. They're dealing with messy software, outdated browsers, and human workflows that change every week. OpenAI's Operator fails hard in that reality. It gets confused by UI changes. It can't handle edge cases. It needs constant babysitting. That's not automation. That's just a more expensive chatbot.

Why Coasty Is the Only Computer Use Agent That Actually Works

Coasty isn't playing the same game as the big players. It's built around real computer use, not just API calls. It controls actual desktops, browsers, and terminals like a human would. That's why it scored 82% on OSWorld. It doesn't guess. It sees what's on screen. It reacts to what actually happens. You can run it locally or in the cloud. It supports BYOK. It can swarm multiple agents in parallel for heavy workloads. It's not perfect, but it's the closest thing we have to a real computer-using AI.

Stop Wasting Money on Bad Computer Use Agents

Here's what you should do. Dump OpenAI's Operator if you're paying for it. It's overpriced and underpowered. Try Coasty. It's free tier is generous. You can bring your own keys. It's designed for production workloads, not research demos. The 82% OSWorld score isn't a gimmick. It's proof that the technology can actually work. If you're still paying people to copy-paste data in 2026, you're funding your own obsolescence. Use a computer use agent that can actually do the job.

The AI revolution isn't dead. It's just that most of what you see in the marketing is fake. OpenAI scored 38% on OSWorld. Coasty scored 82%. That's the gap between hype and reality. Don't pay for hype. Pay for results. Check out coasty.ai and see what a computer use agent should actually be able to do.

Want to see this in action?

View Case Studies
Try Coasty Free