Tutorial

How to Automate Any Desktop App with the Coasty Computer Use API

Lisa Chen||10 min
+B

You can automate a web app by calling its REST endpoints. But desktop apps open new windows, show tooltips, use dynamic IDs, and require mouse clicks. Hand-crafted selectors break with the next UI update. You need a computer use agent that sees the screen, understands context, and acts like a human. The Coasty computer use API provides exactly that. It lets you drive real desktops, browsers, and terminals with vision and actions, all from a simple API. You start a machine, send a task, and Coasty handles the full loop until the job is done.

How it works

The core flow uses a machine to host a real desktop, then a task run that steps through the UI. You send a POST /v1/runs request with a machine_id, a task description, and a cua_version. Coasty launches an agent on that machine, watches the screen, predicts actions, and executes them. The server streams events back so you can monitor progress. The agent continues until it reaches the success state or fails. You can inspect the full trajectory, cancel, or resume later. For stateful memory, you can use sessions with separate predict calls, but the simplest route is a task run.

bash
curl -X POST https://coasty.ai/v1/runs \ 
-H "X-API-Key: $COASTY_API_KEY" \ 
-H "Content-Type: application/json" \ 
-d '{ 
  "machine_id": "mch_abc123", 
  "task": "Open Chrome, navigate to coasty.ai, click the sign-up button, fill the email field, and submit", 
  "cua_version": "v4", 
  "instructions": "Use the browser only. Do not open other applications.", 
  "max_steps": 150, 
  "deadline_seconds": 300, 
  "on_awaiting_human": "pause" 
}'

Task Run fields and options

  • machine_id (required): Cloud VM identifier to host the desktop.
  • task (required): Natural-language description of the job.
  • cua_version (optional, default v3): v3 is guided, v4 is autonomous with a pass/fail verifier.
  • instructions (optional): Additional guidance appended to the base prompt.
  • system_prompt (optional): Custom system instructions for the agent.
  • max_steps (optional): Upper bound on agent steps; default is recommended.
  • deadline_seconds (optional): Timeout for the run in seconds.
  • on_awaiting_human (optional): How to handle human approval events: pause, fail, or cancel.
  • webhook_url (optional): Endpoint to receive run events and status updates.

POST /v1/runs $0.05 per agent step, billed from a prepaid USD wallet.

Where this beats brittle automation

Traditional automation tools rely on static selectors, XPath, or element IDs. When a UI changes, scripts break. The Coasty computer use agent sees the screen each step, understands context, and plans actions based on what is visually present. It can handle dynamic labels, tooltips, and multiple windows without extra glue code. It also uses real desktops and browsers, not mock environments, so you can automate complex, multi-step workflows like uploading files, filling forms, and navigating nested menus. The stateful session model keeps trajectory memory for long-running workflows, while task runs give you a simple, server-driven completion model.

Next steps

  • Follow the guides at https://coasty.ai/docs to set up machines and task runs.
  • Explore workflows for multi-step pipelines with assert, if, loop, and parallel steps.
  • Use vision endpoints like /v1/predict, /v1/sessions/{id}/predict, and /v1/ground for fine-grained control.
  • Integrate with your own systems via webhooks and HMAC signatures.

You now have a clear path to automate any desktop app with the Coasty computer use API. Send a task, watch events, and let the agent drive the UI. Start building autonomous workflows with real desktops and browsers. Get your API key at https://coasty.ai/developers.

Want to see this in action?

View Case Studies
Try Coasty Free