Send a screenshot, get structured mouse and keyboard actions back. Build automation, testing, and AI agents that interact with any GUI.
Screenshot in, structured actions out. No browser drivers, no DOM parsing, no selectors to maintain.
Send screenshot
Base64 PNG/JPEG + instruction
AI reasons
Vision model identifies UI elements
Get actions
click(512, 340), type('hello')
Understands any UI — web apps, desktop software, mobile screens. No selectors or DOM access needed.
Stateful sessions maintain trajectory history across steps. The AI remembers what it's already done.
V3 for speed (3.5s/step, multi-action). V1 for accuracy (reflection, single-action). Choose per request.
Works with browser screenshots, desktop apps, mobile emulators, VNC streams — anything visual.
click, type, scroll, drag, key combos, and more. Exact coordinates returned for every action.
Simple REST API. Works with Python, JavaScript, Go, Ruby, PHP, Java, C#, cURL — anything with HTTP.
Pay only for what you use. Credits deducted from your shared balance. No separate API subscription.
POST /predict5 crPOST /sessions10 crPOST /sessions/{id}/predict4 crPOST /ground3 crPOST /ocr3 crPOST /parseFreeGET /models, /usage, /sessionsFreeSurcharges
The CUA API gives your code the ability to see and interact with any screen. Send a screenshot and a natural language instruction — receive structured mouse clicks, keyboard inputs, and scroll commands with exact coordinates.
Every request needs an X-API-Key header. Sign up to create API keys. Credits are deducted per request from your shared balance.
X-API-Key: cua_sk_your_key_hereChoose your language. The predict endpoint is the core of the API — everything else builds on it.
pip install requestsimport requests, base64
API_KEY = "cua_sk_..."
img = base64.b64encode(open("screen.png", "rb").read()).decode()
r = requests.post(
"https://coasty.ai/api/v1/cua/predict",
headers={"X-API-Key": API_KEY},
json={
"screenshot": img,
"instruction": "Click the search bar and type 'hello'",
},
)
for action in r.json()["actions"]:
print(action["action_type"], action["params"])# Create a session for multi-step tasks
s = requests.post(
"https://coasty.ai/api/v1/cua/sessions",
headers={"X-API-Key": API_KEY},
json={"cua_version": "v3", "screen_width": 1920, "screen_height": 1080},
).json()
session_id = s["session_id"]
# Send screenshots in a loop
while True:
screenshot = capture_screenshot() # your screenshot function
r = requests.post(
f"https://coasty.ai/api/v1/cua/sessions/{session_id}/predict",
headers={"X-API-Key": API_KEY},
json={"screenshot": screenshot, "instruction": "Complete the form"},
).json()
for action in r["actions"]:
execute_action(action) # your action executor
if r["status"] in ("done", "fail"):
breakEvery prediction returns structured actions with exact coordinates, a status signal, and token usage.
{
"request_id": "req_abc123",
"actions": [
{
"action_type": "click",
"params": { "x": 512, "y": 340, "button": "left", "clicks": 1 }
},
{
"action_type": "type_text",
"params": { "text": "hello world" }
}
],
"reasoning": "I see a search bar at (512, 340)...",
"status": "continue",
"usage": {
"input_tokens": 1523,
"output_tokens": 245,
"credits_charged": 5
}
}clickMouse click at (x, y)type_textType a stringkey_pressPress a key (enter, tab...)key_comboCombo (ctrl+c, cmd+v...)scrollScroll at a positiondragDrag between two pointsmoveMove cursorwaitPause executiondoneTask completedfailTask impossibleOnly screenshot and instruction are required.
screenshotstringrequiredinstructionstringrequiredcua_version"v3" | "v1"screen_widthintscreen_heightintmax_actionsint (1-10)trajectoryarraysystem_promptstringtoolsstring[]All endpoints require the X-API-Key header. Credits deducted from your shared balance.
/api/v1/cua/predict5 cr/api/v1/cua/sessions10 cr/api/v1/cua/sessions/{id}/predict4 cr/api/v1/cua/sessions/{id}/resetFree/api/v1/cua/sessions/{id}Free/api/v1/cua/ground3 cr/api/v1/cua/ocr3 cr/api/v1/cua/parseFree/api/v1/cua/modelsFree/api/v1/cua/usageFree/api/v1/cua/sessionsFreeAll errors return a JSON body with error.code and error.message fields.
INVALID_API_KEYMissing or invalid X-API-KeyINSUFFICIENT_CREDITSNot enough credits for this requestINSUFFICIENT_SCOPEAPI key lacks the required scopeRATE_LIMIT_EXCEEDEDToo many requests — check Retry-After headerINVALID_SCREENSHOTBad base64 or unsupported image formatSESSION_NOT_FOUNDSession expired or does not existCreate a free account, generate an API key, and send your first screenshot. No credit card required.