Computer Use API

Give your code
eyes and hands

Send a screenshot, get structured mouse and keyboard actions back. Build automation, testing, and AI agents that interact with any GUI.

One API call. Full control.

Screenshot in, structured actions out. No browser drivers, no DOM parsing, no selectors to maintain.

1

Send screenshot

Base64 PNG/JPEG + instruction

2

AI reasons

Vision model identifies UI elements

3

Get actions

click(512, 340), type('hello')

Vision-First

Understands any UI — web apps, desktop software, mobile screens. No selectors or DOM access needed.

Multi-Step Sessions

Stateful sessions maintain trajectory history across steps. The AI remembers what it's already done.

Two Engines

V3 for speed (3.5s/step, multi-action). V1 for accuracy (reflection, single-action). Choose per request.

Any Screen

Works with browser screenshots, desktop apps, mobile emulators, VNC streams — anything visual.

10 Action Types

click, type, scroll, drag, key combos, and more. Exact coordinates returned for every action.

Any Language

Simple REST API. Works with Python, JavaScript, Go, Ruby, PHP, Java, C#, cURL — anything with HTTP.

Simple, per-request pricing

Pay only for what you use. Credits deducted from your shared balance. No separate API subscription.

POST /predict5 cr
POST /sessions10 cr
POST /sessions/{id}/predict4 cr
POST /ground3 cr
POST /ocr3 cr
POST /parseFree
GET /models, /usage, /sessionsFree

Surcharges

Trajectory screenshot+2 cr each
HD image (>1280x720)+1 cr/image
V1 engine+3 cr/request
Custom system prompt+1 cr
Computer Use API

Send a screenshot, get actions back

The CUA API gives your code the ability to see and interact with any screen. Send a screenshot and a natural language instruction — receive structured mouse clicks, keyboard inputs, and scroll commands with exact coordinates.

Authentication

Every request needs an X-API-Key header. Sign up to create API keys. Credits are deducted per request from your shared balance.

header
X-API-Key: cua_sk_your_key_here

How it Works

1Capture a screenshot of the target screen
2Send it with a natural language instruction
3Receive structured actions (click, type, scroll...)
4Execute the actions in your environment

Quick Start

Choose your language. The predict endpoint is the core of the API — everything else builds on it.

install
pip install requests
predict — single screenshot
import requests, base64

API_KEY = "cua_sk_..."
img = base64.b64encode(open("screen.png", "rb").read()).decode()

r = requests.post(
    "https://coasty.ai/api/v1/cua/predict",
    headers={"X-API-Key": API_KEY},
    json={
        "screenshot": img,
        "instruction": "Click the search bar and type 'hello'",
    },
)

for action in r.json()["actions"]:
    print(action["action_type"], action["params"])
sessions — multi-step tasks
# Create a session for multi-step tasks
s = requests.post(
    "https://coasty.ai/api/v1/cua/sessions",
    headers={"X-API-Key": API_KEY},
    json={"cua_version": "v3", "screen_width": 1920, "screen_height": 1080},
).json()

session_id = s["session_id"]

# Send screenshots in a loop
while True:
    screenshot = capture_screenshot()  # your screenshot function
    r = requests.post(
        f"https://coasty.ai/api/v1/cua/sessions/{session_id}/predict",
        headers={"X-API-Key": API_KEY},
        json={"screenshot": screenshot, "instruction": "Complete the form"},
    ).json()

    for action in r["actions"]:
        execute_action(action)  # your action executor

    if r["status"] in ("done", "fail"):
        break

Response Format

Every prediction returns structured actions with exact coordinates, a status signal, and token usage.

response
{
  "request_id": "req_abc123",
  "actions": [
    {
      "action_type": "click",
      "params": { "x": 512, "y": 340, "button": "left", "clicks": 1 }
    },
    {
      "action_type": "type_text",
      "params": { "text": "hello world" }
    }
  ],
  "reasoning": "I see a search bar at (512, 340)...",
  "status": "continue",
  "usage": {
    "input_tokens": 1523,
    "output_tokens": 245,
    "credits_charged": 5
  }
}

Action Types

clickMouse click at (x, y)
type_textType a string
key_pressPress a key (enter, tab...)
key_comboCombo (ctrl+c, cmd+v...)
scrollScroll at a position
dragDrag between two points
moveMove cursor
waitPause execution
doneTask completed
failTask impossible

Request Options

Only screenshot and instruction are required.

screenshotstringrequired
instructionstringrequired
cua_version"v3" | "v1"
screen_widthint
screen_heightint
max_actionsint (1-10)
trajectoryarray
system_promptstring
toolsstring[]

All Endpoints

All endpoints require the X-API-Key header. Credits deducted from your shared balance.

Prediction
POST/api/v1/cua/predict5 cr
POST/api/v1/cua/sessions10 cr
POST/api/v1/cua/sessions/{id}/predict4 cr
POST/api/v1/cua/sessions/{id}/resetFree
DELETE/api/v1/cua/sessions/{id}Free
Utilities
POST/api/v1/cua/ground3 cr
POST/api/v1/cua/ocr3 cr
POST/api/v1/cua/parseFree
Management
GET/api/v1/cua/modelsFree
GET/api/v1/cua/usageFree
GET/api/v1/cua/sessionsFree

Error Handling

All errors return a JSON body with error.code and error.message fields.

401INVALID_API_KEYMissing or invalid X-API-Key
402INSUFFICIENT_CREDITSNot enough credits for this request
403INSUFFICIENT_SCOPEAPI key lacks the required scope
429RATE_LIMIT_EXCEEDEDToo many requests — check Retry-After header
400INVALID_SCREENSHOTBad base64 or unsupported image format
404SESSION_NOT_FOUNDSession expired or does not exist

Start building in minutes

Create a free account, generate an API key, and send your first screenshot. No credit card required.

Coasty - #1 Computer-Use AI Agent | AI Employee for Desktop & Browser Automation