How to Automate Any Desktop App with the Coasty Computer Use API
Desktop automation traditionally relies on brittle selectors or limited APIs. The Coasty computer use API flips that by giving you an agent that sees your screen, decides where to click and type, and executes actions. You drive any app, browsers, terminals, spreadsheets, without writing selectors. You just describe what to do.
How it works
A computer use agent runs task runs. You POST to /v1/runs with a machine_id, a task description, and optional instructions. The agent starts the machine, watches its screen, and takes actions until the task succeeds or fails. You can stream events with GET /v1/runs/{id}/events and cancel or resume runs. Billing is $0.05 per agent step.
# Example: POST a task run and stream events
export COASTY_API_KEY="$(cat ~/.coasty_key)"
# Create a task run
RUN_ID=$(curl -s -X POST https://coasty.ai/v1/runs \
-H "X-API-Key: $COASTY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"machine_id": "aws-us-east-1-1234",
"task": "Open Chrome, navigate to https://example.com, and take a screenshot",
"cua_version": "v3",
"on_awaiting_human": "pause"
}' | jq -r '.id')
# Stream events
curl -s -N https://coasty.ai/v1/runs/$RUN_ID/events \
-H "X-API-Key: $COASTY_API_KEY" \
--header "Last-Event-ID: $LAST_ID"Key fields for task runs
- ●machine_id: the cloud VM you want the agent to drive
- ●task: a natural language description of what to do
- ●cua_version: 'v3' for standard agent behavior, 'v4' for autonomous mode with a pass/fail verifier
- ●instructions: optional text appended to the base prompt
- ●system_prompt: optional system prompt for the agent
- ●max_steps: optional maximum number of agent steps
- ●deadline_seconds: optional timeout for the run
- ●on_awaiting_human: 'pause', 'fail', or 'cancel' when the agent needs human input
- ●webhook_url: optional URL to send run state updates
- ●Billed $0.05 per agent step
POST /v1/runs for a task run, then GET /v1/runs/{id}/events to see the agent in action.
Where this beats brittle automation
Traditional automation requires stable selectors, explicit waits, and frequent maintenance as UIs change. A computer use agent reasons from the visual state of your screen. It can handle dynamic text, missing elements, and layout shifts because it observes the real UI. You also get stateful sessions with /v1/sessions for long-running workflows and vision tools like /v1/predict and /v1/ground to locate elements by description.
Beyond task runs
- ●Use /v1/workflows to model multi-step processes with conditions, loops, and parallel tasks. Each task step is $0.05.
- ●Create stateful sessions with POST /v1/sessions and then /v1/sessions/{id}/predict to build persistent agents.
- ●Parse existing Python automation with /v1/parse (free) to convert pyautogui code into structured actions.
- ●Use the MCP server to integrate Coasty into Cursor, Claude Desktop, or other MCP clients.
- ●Manage machines with POST /v1/machines to provision cloud VMs that the agent can drive.
You can now automate any desktop app by describing your goal in natural language. Build multi-step workflows, integrate with your CI/CD pipelines, or extend your existing AI agents. Get your API key at https://coasty.ai/developers and start building.