How to Automate Any Desktop App with the Coasty Computer Use API
Desktop automation usually means brittle selectors, endless maintenance, and fragile scripts that break on UI updates. The Coasty Computer Use API flips that model. Instead of brittle selectors, you give the agent a screenshot and a natural language instruction. It sees what you see, clicks what you click, and types what you type. This lets you automate any application that has a GUI, including custom tools, legacy systems, and complex desktop software. You pay $0.05 per agent step for full control over the desktop or browser.
How it works
The Computer Use API uses vision to understand the screen. You send a base64 screenshot, an instruction, and a CUA version. The server returns a list of actions such as click, type, scroll, and move the cursor. You capture a new screenshot, call predict again, and repeat until the status is done. For stateful automation, you create a session first, then send screenshots and instructions to /v1/sessions/{id}/predict. This maintains a trajectory of actions and improves over time. Coasty also provides /v1/ground to map an element description to x,y coordinates and /v1/parse to convert PyAutoGUI code into structured actions.
curl -X POST https://coasty.ai/v1/predict \
-H "X-API-Key: $COASTY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"screenshot": "$(base64 -i screenshot.png -w 0)",
"instruction": "Click the sign up button in the top right corner",
"cua_version": "v3"
}' | jqFull automation with Task Runs
- ●POST /v1/runs provisions a cloud VM with a real desktop, browser, or terminal environment
- ●You provide a task, CUA version, optional instructions, system prompt, max steps, deadline, and webhook URL
- ●The server drives the agent step by step, billing $0.05 per agent step
- ●States include queued, running, awaiting_human, succeeded, failed, cancelled, and timed_out
- ●GET /v1/runs streams events via Server-Sent Events, reconnect with Last-Event-ID
POST /v1/runs is the simplest way to automate a desktop workflow without writing a loop yourself.
Where this beats brittle automation
API-only tools rely on stable endpoints, changing IDs, and brittle selector strategies. When UIs change, your scripts break and you spend hours patching selectors. With computer use, the agent sees the screen and adapts to layout changes. It understands context, handles dynamic content, and works across browsers, terminals, and desktop apps. You can automate workflows that involve multiple steps, conditional logic, and human-like interactions without maintaining fragile selectors.
Next steps
- ●Create a task run with POST /v1/runs to automate a real workflow
- ●Use /v1/ground to locate elements before clicking them
- ●Read the full docs at https://coasty.ai/docs and get your API key at https://coasty.ai/developers
You no longer need to maintain fragile selectors for every desktop app. Coasty Computer Use API gives you a computer use agent that sees the screen and acts like a human. Build workflows, form fillers, web scrapers, and RPA pipelines that adapt to UI changes. Get your API key at https://coasty.ai/developers and start automating any desktop app today.