Research

Why Synthetic Data Is the Real Bottleneck for Computer Use Agents

Michael Rodriguez||7 min
Home

Computer use agents, AI that clicks, types, and navigates browsers and desktops, are one of the hottest research topics. But many teams find that their models plateau after a few hundred hours of real interaction data. The bottleneck isn't compute or model size. It's the scarcity, risk, and cost of real data.

Real data is expensive and risky at scale

Gathering high-quality interaction data means running agents on real devices and browsers. Even a modest dataset of 10,000 completed tasks can cost tens of thousands of dollars in compute, storage, and infrastructure. More importantly, real-world data carries risk: data leaks, credential exposure, and compliance violations. Many teams cannot simply scale real agents because the cost and risk become unmanageable.

Synthetic data isn't magic; it's a mirror

Synthetic data relies on a single assumption: the simulation or generator accurately reflects the real world. If the simulator lacks realistic UI layouts, network conditions, or user intent, the synthetic trajectories will reinforce bad behaviors. Research shows that synthetic data works best when it mirrors key aspects of the target domain. A synthetic dataset that looks like a real web app but lacks real navigation patterns will not generalize well.

Bottlenecks show up in training curves

Concrete metrics illustrate the problem. Teams that train agents on 50,000 real tasks often see a 15-25% performance gap compared to agents trained on 200,000 real tasks. Synthetic augmentation can close part of that gap, but only if the synthetic trajectories cover rare but important scenarios. A synthetic dataset that oversimplifies error handling will leave agents brittle in production.

Key tradeoffs in synthetic data pipelines

  • Coverage vs. fidelity: High-fidelity simulations produce realistic trajectories but are costly to build and maintain.
  • Domain gap: Synthetic environments that differ too much from real apps lead to poor generalization.
  • Label quality: Synthetic trajectories often lack accurate reward labels or ground-truth actions, complicating supervised learning.
  • Refresh rate: Real-world UI and workflows evolve quickly; synthetic datasets must be updated frequently to stay relevant.

The bottleneck is not the lack of synthetic data. It's the ability to generate high-fidelity, domain-matched interaction data at scale.

How Coasty fits

Coasty runs computer use agents on real desktops and browsers. This approach captures realistic interaction data, including nuanced UI behaviors, error states, and multi-step workflows. The team can use this data to produce custom synthetic datasets and trajectories tailored to your specific domain and use case. Coasty's synthetic data service is custom and contact-led: you discuss your needs, and the team designs a dataset strategy around them.

If you're hitting a performance wall with computer use agents, it's time to rethink your data strategy. Reach out to the Coasty data team to explore whether custom synthetic data can unblock your progress. Book a data call at https://cal.com/coasty/coasty-data-call to get started.

Want to see this in action?

View Case Studies
Try Coasty Free