API reference
Build agents that see and act. The full Computer Use API: stateless prediction, stateful sessions, autonomous task runs, and multi-step workflows.
Introduction
The Coasty Computer Use API gives your code the ability to see a screen and act on it. You send a screenshot and a plain-language instruction; the model returns a precise list of actions — clicks, keystrokes, scrolls, and drags — with exact pixel coordinates. Your program performs those actions, captures a new screenshot, and asks again. That loop is how an agent drives any interface, real or virtual, without brittle selectors or per-app scripts.
Everything is a normal HTTPS request to https://coasty.ai/v1. There is no SDK to install and no websocket to manage for the core endpoints: each call is stateless unless you opt into a session. Responses are JSON and stream nothing, so any HTTP client works.
Authentication
Every request must include your secret key. The canonical way is the X-API-Key header, but Authorization: Bearer <key> works too: a blank X-API-Key falls through to the Bearer header. Pick one form and send the raw key. Do not paste the literal text Bearer inside X-API-Key; that is the single most common first-day mistake and it returns 401 INVALID_API_KEY. Keys are created and revoked from the API keys page. Treat a key like a password: keep it server-side, store it in an environment variable, and never commit it or ship it in client-side code.
X-API-Key: sk-coasty-live-your_key_heresk-coasty-test- key never bills and runs against mock VMs, yet exercises the exact same request and response shapes (its X-Credits-Charged and usage.cost_cents are always 0), so you can build and run CI confidently before flipping to a live key.Quickstart
Your first prediction is four steps: export a key, capture a screenshot, base64-encode it, and POST it with an instruction. Grab a test key from the API keys page (it never bills) and set it in your shell:
export COASTY_API_KEY="sk-coasty-test-your_key_here"Now send the prediction. The call below uploads a screenshot, asks the model to click a button, and prints the actions it returns. Pick your language:
import base64, os, requests
API_KEY = os.environ["COASTY_API_KEY"]
with open("screen.png", "rb") as f:
screenshot = base64.b64encode(f.read()).decode()
res = requests.post(
"https://coasty.ai/v1/predict",
headers={"X-API-Key": API_KEY},
json={
"screenshot": screenshot,
"instruction": "Click the login button",
"screen_width": 1920,
"screen_height": 1080,
},
timeout=60,
)
res.raise_for_status()
data = res.json()
print(data["status"]) # "continue" | "done" | "fail"
for action in data["actions"]:
print(action["action_type"], action["params"])A successful response contains an actions array and a status of continue, done, or fail. Execute each action in order, take a new screenshot, and call again while the status is continue. That loop is the whole API in miniature.
Predict
POST /v1/predict is the stateless workhorse. Each call is independent: you provide the full context every time, which makes it simple to reason about and trivial to scale horizontally. Use it for one-shot decisions and for loops where you manage history yourself. When a task needs the model to remember prior steps automatically, reach for sessions instead.
The response is the standard prediction shape, covered in Response format.
Sessions
A session keeps the trajectory — the running history of screenshots and actions — on our side, so each step only needs the latest screenshot and instruction. This produces better multi-step behaviour on long tasks and keeps your request bodies small. Create a session once, step through the task, then delete it to release your concurrency quota.
import base64, os, requests
BASE = "https://coasty.ai/v1"
HEADERS = {"X-API-Key": os.environ["COASTY_API_KEY"]}
def screenshot() -> str:
with open("screen.png", "rb") as f:
return base64.b64encode(f.read()).decode()
# 1. Open a session — it remembers the trajectory across steps
session = requests.post(f"{BASE}/sessions", headers=HEADERS, json={
"screen_width": 1920,
"screen_height": 1080,
}, timeout=60).json()
session_id = session["session_id"]
# 2. Drive the task one step at a time
try:
for _ in range(20): # safety cap
res = requests.post(
f"{BASE}/sessions/{session_id}/predict",
headers=HEADERS,
json={
"screenshot": screenshot(),
"instruction": "Book a meeting tomorrow at 3pm",
},
timeout=60,
).json()
for action in res["actions"]:
perform(action) # your action executor
if res["status"] != "continue":
break
finally:
# 3. Always release the session to free your concurrency quota
requests.delete(f"{BASE}/sessions/{session_id}", headers=HEADERS, timeout=30)finally block. Sessions count against your tier's concurrent-session limit, and orphaned sessions only expire after 24 hours of inactivity.Grounding
Grounding answers a narrower question than predict: “where is this element?” Give it a screenshot and a description and it returns the exact x, y coordinate to target. It is faster and cheaper than a full prediction, which makes it ideal when you already know what to do and only need a pixel to click.
import os, requests
res = requests.post(
"https://coasty.ai/v1/ground",
headers={"X-API-Key": os.environ["COASTY_API_KEY"]},
json={
"screenshot": screenshot, # base64 PNG (see Quickstart)
"element": "the blue Submit button below the form",
},
timeout=60,
).json()
print(res["x"], res["y"]) # exact click coordinatesThe response is { x, y, usage, request_id }. Coordinates are in the same pixel space as the screenshot you sent.
OCR
OCR extracts every piece of visible text from a screenshot, each with its bounding box. Use it to assert that a page reached the expected state, to scrape values, or to feed text into your own logic. Returns a flat full_text string plus an elements array of { text, left, top, width, height }.
import os, requests
res = requests.post(
"https://coasty.ai/v1/ocr",
headers={"X-API-Key": os.environ["COASTY_API_KEY"]},
json={"screenshot": screenshot}, # base64 PNG (see Quickstart)
timeout=60,
).json()
print(res["full_text"])
for el in res["elements"]:
print(repr(el["text"]), "at", (el["left"], el["top"]))Parse
Parse converts a block of pyautogui code into the same structured action objects the model returns. It is deterministic, runs no model, and is free. Use it to migrate existing automation scripts onto Coasty's executor, or to normalise hand-written steps into the canonical action schema.
import os, requests
res = requests.post(
"https://coasty.ai/v1/parse",
headers={"X-API-Key": os.environ["COASTY_API_KEY"]},
json={"code": "pyautogui.click(100, 200)\npyautogui.typewrite('hello')"},
timeout=30,
).json()
for action in res["actions"]:
print(action["action_type"], action["params"])Task runs
A run hands the agent a task and a machine, then drives it to completion on our side. The agent loops autonomously, verifies its own work (pass or fail), can pause for a human when it hits a wall, bills per step from your dollar API wallet, and streams every event live. You start one call and watch, instead of running the predict loop yourself.
Create a run with POST /v1/runs. The two required fields are machine_id and task. The response is an agent.run object with status of queued, plus a one-time webhook_secret you store to verify webhooks. Send an Idempotency-Key header to make a retried create safe.
import os, time, requests
BASE = "https://coasty.ai/v1"
HEADERS = {"X-API-Key": os.environ["COASTY_API_KEY"]}
TERMINAL = {"succeeded", "failed", "cancelled", "timed_out"}
# 1. Start a run. Idempotency-Key makes a retried create safe.
run = requests.post(
f"{BASE}/runs",
headers={**HEADERS, "Idempotency-Key": "order-4821"},
json={
"machine_id": "m_9f2c",
"task": "Open the billing page and download the latest invoice as PDF",
"cua_version": "v3", # "v4" needs professional tier or above
"max_steps": 40,
"on_awaiting_human": "pause",
},
timeout=30,
).json()
run_id = run["id"]
print(run["status"]) # "queued"
webhook_secret = run.get("webhook_secret") # shown once; store it now
# 2. Poll until terminal.
while True:
run = requests.get(f"{BASE}/runs/{run_id}", headers=HEADERS, timeout=30).json()
print(run["status"], run["steps_completed"], "steps")
if run["status"] in TERMINAL:
break
time.sleep(2)
print(run["result"]) # {"passed": ..., "status": ..., "summary": ...}{
"id": "run_7a1b2c3d",
"object": "agent.run",
"status": "queued",
"machine_id": "m_9f2c",
"task": "Open the billing page and download the latest invoice as PDF",
"cua_version": "v3",
"instructions": null,
"max_steps": 40,
"on_awaiting_human": "pause",
"steps_completed": 0,
"credits_charged": 0,
"cost_cents": 0,
"result": null,
"error": null,
"awaiting_human_reason": null,
"metadata": {
"team": "finance"
},
"webhook_url": "https://example.com/hooks/coasty",
"created_at": "2026-06-01T12:00:00Z",
"started_at": null,
"awaiting_human_since": null,
"finished_at": null,
"request_id": "req_4f9a2b1c",
"webhook_secret": "whsec_one_time_value_shown_here"
}queued to running, can bounce between running and awaiting_human, and ends in one of succeeded, failed, cancelled, or timed_out. Terminal states are immutable, so it is always safe to stop polling once you reach one. Runs need the runs:read and runs:write scopes, granted to new keys by default.Streaming events
GET /v1/runs/{id}/events returns a Server-Sent Events stream so you can follow a run as it happens, instead of polling. Each event has a type and a numeric id (the sequence number). If your connection drops, reconnect and replay everything you missed by sending the last sequence you saw as a Last-Event-ID header, or as the ?after= query parameter. The stream closes after the done event.
import os, httpx
BASE = "https://coasty.ai/v1"
HEADERS = {"X-API-Key": os.environ["COASTY_API_KEY"]}
run_id = "run_7a1b"
last_seq = 0 # persist this so a reconnect can replay
# httpx streams the SSE body line by line. Reconnect with Last-Event-ID.
with httpx.stream(
"GET",
f"{BASE}/runs/{run_id}/events",
headers={**HEADERS, "Last-Event-ID": str(last_seq)},
timeout=None,
) as resp:
event_type = "message"
for line in resp.iter_lines():
if line.startswith("id:"):
last_seq = int(line[3:].strip())
elif line.startswith("event:"):
event_type = line[6:].strip()
elif line.startswith("data:"):
data = line[5:].strip()
print(event_type, data)
if event_type == "done":
breakHuman takeover
Some steps need a person: a captcha, a one-time code, a judgment call. When the agent reaches one and on_awaiting_human is pause, the run moves to awaiting_human and emits an awaiting_human event with a reason. A human completes the blocking step (in the same machine session), then you hand control back with POST /v1/runs/{id}/resume and an optional note. Resume is only valid while the status is awaiting_human.
import os, requests
BASE = "https://coasty.ai/v1"
HEADERS = {"X-API-Key": os.environ["COASTY_API_KEY"]}
run_id = "run_7a1b"
run = requests.get(f"{BASE}/runs/{run_id}", headers=HEADERS, timeout=30).json()
# resume is only valid while status == "awaiting_human".
if run["status"] == "awaiting_human":
print("paused:", run["awaiting_human_reason"])
# ... a human completes the blocking step out of band ...
resumed = requests.post(
f"{BASE}/runs/{run_id}/resume",
headers=HEADERS,
json={"note": "Solved the captcha; continue"},
timeout=30,
).json()
print(resumed["status"]) # back to "running"status == awaiting_human with awaiting_human_reason set), the SSE awaiting_human event, or the run.awaiting_human webhook. After resume, the run returns to running and emits a resumed event. Set on_awaiting_human to fail or cancel at create time if you would rather the run stop than wait for a human.Webhooks
Pass a webhook_url (https only) when you create a run and we POST a signed callback at each lifecycle transition. The response to your create call includes a webhook_secret exactly once: store it, because every callback is signed with it. Each request carries a Coasty-Signature header of the form t=<unix_ts>,v1=<hex>.
To verify, build the signed payload as "<t>." + raw_request_body, compute HMAC-SHA256 over it keyed by the webhook_secret, and compare against v1 with a constant-time check. Always hash the raw body bytes, before any JSON re-serialisation.
import hashlib, hmac, os, requests
BASE = "https://coasty.ai/v1"
HEADERS = {"X-API-Key": os.environ["COASTY_API_KEY"]}
# 1. Create a run with a webhook_url. webhook_secret is returned exactly once.
run = requests.post(
f"{BASE}/runs",
headers=HEADERS,
json={
"machine_id": "m_9f2c",
"task": "Reconcile the invoice against the order",
"webhook_url": "https://example.com/hooks/coasty",
},
timeout=30,
).json()
webhook_secret = run["webhook_secret"] # persist this securely
# 2. In your webhook handler, verify the Coasty-Signature header.
def verify(raw_body: bytes, signature_header: str, secret: str) -> bool:
parts = dict(p.split("=", 1) for p in signature_header.split(","))
signed = f"{parts['t']}.".encode() + raw_body
expected = hmac.new(secret.encode(), signed, hashlib.sha256).hexdigest()
return hmac.compare_digest(expected, parts["v1"])
# Example (your framework supplies the raw body + header):
# ok = verify(request.body, request.headers["Coasty-Signature"], webhook_secret)Workflows
A workflow composes many runs into one versioned program, with branching, loops, and guards expressed as a JSON DSL. Each task step is itself an agent run, so a workflow is the way to chain tasks, gate them on conditions, and pass results between them. Workflows are versioned: re-creating the same slug bumps the version, and a PUT does too.
Create one with POST /v1/workflows. The slug must match [a-z0-9_-]. The response is a Workflow carrying an id, a version, and the current dsl_version (2026-06-01).
import os, requests
BASE = "https://coasty.ai/v1"
HEADERS = {"X-API-Key": os.environ["COASTY_API_KEY"]}
definition = {
"steps": [
{
"id": "fetch",
"type": "task",
"task": "Open order {{inputs.order_id}} and read the invoice total",
"save_as": "invoice",
},
{
"id": "check",
"type": "assert",
"condition": {"op": "truthy", "value": "{{invoice.passed}}"},
"message": "Agent failed to read the invoice",
},
{
"id": "branch",
"type": "if",
"condition": {"op": "contains", "left": "{{invoice.result}}", "right": "PAID"},
"then": [{"id": "ok", "type": "succeed", "output": {"state": "paid"}}],
"else": [{"id": "no", "type": "fail", "message": "Invoice not marked paid"}],
},
],
}
# 1. Create the workflow. Re-using the same slug bumps its version.
wf = requests.post(
f"{BASE}/workflows",
headers=HEADERS,
json={
"name": "Invoice reconciliation",
"slug": "invoice-reconcile",
"inputs_schema": {"type": "object", "properties": {"order_id": {"type": "string"}}},
"definition": definition,
},
timeout=30,
).json()
print(wf["id"], "v", wf["version"], wf["dsl_version"])
# 2. Start a run of the saved workflow.
run = requests.post(
f"{BASE}/workflows/{wf['id']}/runs",
headers=HEADERS,
json={"inputs": {"order_id": "ord_4821"}, "machine_id": "m_9f2c", "budget_cents": 500},
timeout=30,
).json()
print(run["id"], run["status"])workflows:read and workflows:write scopes, granted to new keys by default. See the Workflow DSL for the full step and condition catalogue.Workflow DSL
The DSL (dsl_version 2026-06-01) is a JSON object with a steps array and an optional output. Each step has an id and a type. A task step runs the agent and binds its result ({ status, passed, result, run_id, steps, error }) under both its save_as name and its step id, so later steps can read it.
{
"dsl_version": "2026-06-01",
"definition": {
"steps": [
{
"id": "fetch",
"type": "task",
"task": "Open order {{inputs.order_id}} and read the invoice total",
"save_as": "invoice"
},
{
"id": "check",
"type": "assert",
"condition": {
"op": "truthy",
"value": "{{invoice.passed}}"
},
"message": "Agent failed to read the invoice"
},
{
"id": "branch",
"type": "if",
"condition": {
"op": "contains",
"left": "{{invoice.result}}",
"right": "PAID"
},
"then": [
{
"id": "ok",
"type": "succeed",
"output": {
"state": "paid"
}
}
],
"else": [
{
"id": "no",
"type": "fail",
"message": "Invoice not marked paid"
}
]
}
],
"output": {
"paid": "{{invoice.result}}"
}
}
}Conditions are structured rather than expression strings, which keeps them injection-safe. Each left, right, or value is either a literal or a {{path}} reference. Paths are dotted lookups into inputs.*, vars.*, and any step id or save_as name.
budget_cents (spend cap in USD cents; 0 means unlimited), max_iterations (loop cap), and deadline_seconds (wall-clock). A breach ends the run as failed or timed_out.A definition is validated before it is accepted. The limits below are enforced at create and ad-hoc time, so an invalid definition is rejected with 422 VALIDATION_ERROR rather than failing mid-run.
definition is snapshotted into that run, so editing or replacing the workflow (which bumps its version) never changes runs already in flight. Each run records the workflow_version it executed.Running workflows
Start a saved workflow with POST /v1/workflows/{id}/runs, or run a definition inline (without saving) with POST /v1/workflows/runs by adding a definition (and optional inputs_schema) to the same body. Both return a workflow.run. The body accepts inputs, a default machine_id for task steps, and the budget_cents, max_iterations, and deadline_seconds guards. An Idempotency-Key header is honoured here too.
import os, requests
BASE = "https://coasty.ai/v1"
HEADERS = {"X-API-Key": os.environ["COASTY_API_KEY"]}
# POST /v1/workflows/runs runs a definition inline, without saving a workflow.
run = requests.post(
f"{BASE}/workflows/runs",
headers=HEADERS,
json={
"machine_id": "m_9f2c",
"inputs": {"url": "https://status.example.com"},
"max_iterations": 5,
"definition": {
"steps": [
{
"id": "open",
"type": "task",
"save_as": "page",
"task": "Open {{inputs.url}} and report whether all systems are operational",
},
{
"id": "gate",
"type": "assert",
"condition": {"op": "truthy", "value": "{{page.passed}}"},
},
],
},
},
timeout=30,
).json()
print(run["id"], run["status"]) # object == "workflow.run"{
"id": "wfr_5e6f7a8b",
"object": "workflow.run",
"status": "running",
"workflow_id": "wf_1a2b3c",
"workflow_version": 3,
"machine_id": "m_9f2c",
"inputs": {
"order_id": "ord_4821"
},
"output": null,
"error": null,
"awaiting_human_reason": null,
"awaiting_step_id": null,
"iterations_used": 0,
"spent_cents": 0,
"budget_cents": 500,
"created_at": "2026-06-01T12:00:00Z",
"started_at": "2026-06-01T12:00:01Z",
"finished_at": null,
"request_id": "req_9c8b7a6d"
}Action types
Every action the model can return uses an action_type from the table below, paired with a params object. Your executor switches on the type and applies the parameters. The terminal types — done and fail — set the response status and signal you to stop looping.
Response format
Predict and session-predict return the same shape. actions is the ordered list to execute; status tells you whether to keep going (continue), stop successfully (done), or stop because the task is impossible (fail). usage reports tokens and the dollar cost of the call (cost_cents).
Billed success responses also carry two headers you can read without parsing the body: X-Credits-Charged (what this call cost) and X-Credits-Remaining (your wallet balance after it). In the body, the same numbers appear as usage.credits_charged and usage.cost_cents. On an sk-coasty-test- key both are always 0. Every response (success or error) additionally carries an X-Coasty-Request-Id header that mirrors request_id; quote it when contacting support.
{
"request_id": "req_8f2c1e9a",
"status": "continue",
"reasoning": "The login form is visible. I'll click the email field, then type the address.",
"actions": [
{
"action_type": "click",
"params": {
"x": 512,
"y": 340
},
"description": "Click the email field"
},
{
"action_type": "type_text",
"params": {
"text": "[email protected]"
},
"description": "Type the email address"
}
],
"raw_code": [
"pyautogui.click(512, 340)",
"pyautogui.typewrite('[email protected]')"
],
"usage": {
"input_tokens": 1523,
"output_tokens": 245,
"credits_charged": 5,
"cost_cents": 45
}
}Errors
Errors return a non-2xx status and a JSON envelope under an error key. The code is stable and safe to branch on; message is human-readable and may change. Every error also carries an error.request_id (mirrored in the X-Coasty-Request-Id response header), plus error.suggestion and error.docs_url for self-service. A Link: <url>; rel="help" header mirrors docs_url. Always log the request id: it is the fastest way for us to trace a failed call.
Some codes attach machine-readable context to the body. A 402 (INSUFFICIENT_CREDITS) reports required and balance; a 403 reports required_scope and current_scopes; a 422 VALIDATION_ERROR lists the offending field path under error.details; and a 409 state conflict carries current_state with allowed_from or required_state.
{
"error": {
"code": "INSUFFICIENT_CREDITS",
"message": "Your API wallet does not have enough funds to complete this request.",
"type": "payment_required",
"suggestion": "Add funds in the dashboard, or use an sk-coasty-test- key while building (test keys never bill).",
"docs_url": "https://coasty.ai/developers/docs#errors",
"required": 45,
"balance": 12,
"request_id": "req_8f2c1e9a"
}
}429, 503 (UPSTREAM_UNAVAILABLE), and 504 (UPSTREAM_TIMEOUT) as retryable: honor Retry-After on a 429, and use an Idempotency-Key with exponential backoff on the upstream codes. A 500 model failure (PREDICTION_FAILED, GROUNDING_FAILED, OCR_FAILED) auto-refunds the charge, so retrying is free.Troubleshooting
Five mistakes account for almost every first-week support ticket. Each maps to one status and one fix:
Rate limits
Limits apply per key and, in aggregate, per user. Every response carries X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset (a Unix timestamp) so you can pace requests precisely rather than guessing. When you exceed a limit you get 429 RATE_LIMIT_EXCEEDED with a Retry-After header: honor it before retrying. The per_user cap is shared across all your keys, so minting more keys does not raise it. A separate 429 TOO_MANY_RUNS guards the concurrent-run cap for agent runs.
Pricing
Requests are billed in US dollars from your API wallet. The charge is taken before the model runs and automatically refunded if a request fails server-side. Internally each request unit is $0.01 (the granularity behind every price below), but everything you pay and see is dollars. High-resolution screenshots (above 1280×720) and longer trajectories add a small surcharge; test keys are always free.