Tutorial

From Prototype to Production with the Coasty Computer Use API

Alex Thompson||8 min
+Space

You need an agent that drives real desktops, browsers, and terminals the way a human does. Coasty’s computer use API lets you iterate fast on a prototype and then scale with stable endpoints. You pay per step, not per seat, and you can drive cloud machines directly. This guide shows the full path from a simple task loop to a production-ready workflow.

How it works

The core loop is capture, predict, act. You send a base64 screenshot and an instruction to POST /v1/predict. The response returns actions and a status. You repeat the loop until status becomes done. For stateful trajectories you use sessions: POST /v1/sessions creates a session, then POST /v1/sessions/{id}/predict uses the memory of prior steps. For element targeting you can POST /v1/ground with a screenshot and an element description to get x,y coordinates. For prototyping you can also turn pyautogui code into structured actions with POST /v1/parse at no cost. Task Runs and Workflows let you offload the loop to the server. Task Runs with cua_version v4 use an autonomous verifier and return succeeded or failed. Workflows are a versioned JSON DSL with steps like task, assert, if, loop, parallel, retry, succeed, and fail. Billing is prepaid: 1 credit equals $0.01. One agent step costs $0.05. POST /v1/runs with machine_id, task, cua_version, max_steps, deadline_seconds, and optional instructions drives a run and bills $0.05 per agent step. GET /v1/runs, GET /v1/runs/{id}, POST /v1/runs/{id}/cancel, POST /v1/runs/{id}/resume manage runs. GET /v1/runs/{id}/events streams Server-Sent Events with reconnect via Last-Event-ID. States include queued, running, awaiting_human, succeeded, failed, cancelled, timed_out. Machines are provisioned with POST /v1/machines so the agent can drive real desktops, browsers, and terminals. You can also drive Coasty from Cursor, Claude Desktop, or other clients via the MCP server.

python
import base64
import os
import requests

COASTY_API_KEY = os.getenv("COASTY_API_KEY")
BASE_URL = "https://coasty.ai/v1"
headers = {"X-API-Key": COASTY_API_KEY}

def encode_image(path):
    with open(path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

def main():
    image_path = "screenshot.png"
    instruction = "Click the 'Save' button at the top right."
    image_b64 = encode_image(image_path)
    payload = {
        "screenshot": image_b64,
        "instruction": instruction,
        "cua_version": "v3"
    }
    resp = requests.post(f"{BASE_URL}/predict", json=payload, headers=headers)
    resp.raise_for_status()
    result = resp.json()
    print("Actions:", result.get("actions"))
    print("Status:", result.get("status"))

if __name__ == "__main__":
    main()

Prototype with predict and sessions

  • POST /v1/predict costs $0.05 per call and returns actions and status.
  • POST /v1/sessions costs $0.10 and stores trajectory memory for repeatable behavior.
  • POST /v1/sessions/{id}/predict costs $0.04 and uses the session’s history.
  • POST /v1/ground costs $0.03 and maps element descriptions to x,y coordinates.
  • POST /v1/parse is free and turns pyautogui snippets into structured actions.

The stable pattern: capture → POST /v1/predict → act → repeat until status is done.

Where this beats brittle automation

Traditional automation relies on brittle selectors and fixed API endpoints that break when UI changes. Coasty’s computer use API iterates on visual context. It reads the screen, understands natural language instructions, and issues actions like clicks, drags, keystrokes, and text entry. This makes it resilient to layout shifts, dynamic IDs, and hidden elements. You can also target specific elements with POST /v1/ground, which interprets element descriptions and returns coordinates. For production you can transition from a custom loop to Task Runs or Workflows, letting the server manage retries, deadlines, and human approval. This reduces code complexity and scales to multiple machines with a single API call.

Production-ready patterns

  • POST /v1/machines provisions cloud VMs for the agent to drive directly.
  • POST /v1/runs with cua_version v4 uses an autonomous verifier and returns succeeded or failed.
  • POST /v1/runs/{id}/events streams events with Last-Event-ID for idempotent reconnection.
  • POST /v1/workflows defines a versioned JSON DSL with task, assert, if, loop, parallel, retry, succeed, and fail steps.
  • Budget guards like budget_cents, max_iterations, and deadline_seconds in workflows prevent runaway costs.
  • Idempotency-Key header and HMAC-signed webhooks (Coasty-Signature) enable safe retries and webhook validation.

Start with a predict loop, add session memory, then move to Task Runs and Workflows. Use POST /v1/machines for real desktops and the MCP server to drive Coasty from your editor. Get a key at https://coasty.ai/developers and ship your first computer use agent.

Want to see this in action?

View Case Studies
Try Coasty Free