Tutorial

Screenshot to Action: A Deep Dive Into the /v1/predict Endpoint

Name: Coasty AI Employee
Brand: Coasty
Price: 19 USD
Availability: InStock
Rating: 4.8 (1250 reviews)

Emily Watson|June 29, 2026|6 min

End

Many automation tools rely on brittle selectors or APIs that you must know in advance. The /v1/predict endpoint flips that model by letting your agent see the current state of the screen and decide where to click or type. You send a base64 screenshot and a short instruction. The API returns a list of actions such as mouse_move, left_click, and key_press. You loop capture, predict, and act until the status is done. This pattern powers a computer use agent that behaves like a human user on real desktops and browsers.

How it works

The /v1/predict endpoint expects a single POST request to https://coasty.ai/v1/predict. The body must include a base64-encoded screenshot, a natural language instruction, and the cua_version. The server returns a JSON object with an actions array and a status field. You capture the next screenshot and call predict again while the status is not done. The process stops when status becomes done.

bash

$ curl -X POST https://coasty.ai/v1/predict \
  -H "X-API-Key: $COASTY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "screenshot": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
    "instruction": "Click the orange button on the top right.",
    "cua_version": "v3"
  }'

# Example response
{
  "actions": [
    {"type": "mouse_move", "x": 450, "y": 80},
    {"type": "left_click"}
  ],
  "status": "done"
}

Request and response details

●screenshot: a base64 string of the current screen image.
●instruction: free-form natural language describing what to do.
●cua_version: a string like "v3" or "v4". v4 defines an autonomous agent with a pass/fail verifier.
●actions: an array of action objects such as mouse_move, left_click, right_click, double_click, mouse_down, mouse_up, key_press, text_input.
●status: a string that can be "done" or another value indicating more steps are needed.

Loop capture a screenshot, call /v1/predict, and act on the returned actions until status is done.

Where this beats brittle automation

Traditional tools require you to know the XPath, CSS selector, or API endpoint ahead of time. If the UI changes, those selectors break. With the /v1/predict endpoint you only need a descriptive instruction. The model understands the visual layout and chooses the correct element. This makes your agent robust to small layout shifts, dynamic classes, and changes in element order. You can drive real desktops, browsers, and terminals without maintaining an ever-growing list of selectors.

The /v1/predict endpoint gives you a simple but powerful way to build a computer use agent. Start by capturing screen images, calling predict, and executing the returned actions. Continue the loop until the status is done. Once you are comfortable, explore stateful sessions, task runs, and workflows for longer workflows. Get your API key at https://coasty.ai/developers and begin building agents that see and act like humans.

Screenshot to Action: A Deep Dive Into the /v1/predict Endpoint

How it works

Request and response details

Where this beats brittle automation

Compare Coasty

Computer Use For

Explore Coasty