Why Synthetic Data Is the Real Bottleneck for Computer Use Agents
Computer use agents are finally producing impressive results on benchmarks. Yet many teams still struggle to get good data for training and evaluation. Real interaction data is costly, hard to label, and risky to share. Synthetic data is the practical solution.
The data gap is bigger than you think
A recent study of 30 open-source computer use agents found that only 23 percent achieved a passable score on real-world browser tasks. The main reason? Not enough high-quality interaction data. Most teams rely on a handful of public datasets and hand-crafted examples. That does not scale. The real gap is not a lack of compute. It is a lack of diverse, correctly labeled interaction trajectories.
Real data has a cost and risk profile
Collecting real desktop and browser interaction data requires access to live environments, consent, and robust labeling pipelines. Labeling is especially expensive. A single hour of correctly labeled browser interaction can cost $30 to $50 depending on the complexity. Beyond cost, there are privacy and compliance constraints. You cannot always use real session logs in production or share them with customers. Synthetic data lets you generate the scenarios you need without exposing real users or proprietary workflows.
Common synthetic data approaches often fail
The most common synthetic data method is rule-based generation. You write scripts that click buttons based on heuristics. This produces clean but brittle trajectories. The model sees the same actions over and over and never learns to handle edge cases. Another approach is simple image augmentation. You flip, crop, or change colors of screenshots. This does not help agents that must reason about layout, context, and sequence. For computer use agents, you need interaction-level signals, not just visual variation.
What works better for computer use agents
- ●Simulate realistic workflows with variable user intent and error paths
- ●Include multi-step sequences that require planning and memory
- ●Add noise, interruptions, and unexpected UI changes
- ●Label intermediate actions, intermediate goals, and final outcomes
- ●Use diverse environments: desktop apps, dashboards, web forms, and SaaS tools
The bottleneck for computer use agents is not the model. It is the quality and diversity of interaction data.
How Coasty fits
Coasty runs computer use agents on real desktops and browsers to capture realistic interaction data. This lets teams generate synthetic datasets and trajectories tailored to their own workflows. The process is custom and contact-led. You talk to the Coasty data team about your specific use case, then Coasty builds the right synthetic data for your agents.
If you are building or evaluating computer use agents, you need high-quality synthetic data that reflects real workflows. Get in touch with the Coasty data team to explore what is possible. Book a data call at https://cal.com/coasty/coasty-data-call.