Screenshot to Action: Deep Dive Into the /v1/predict Endpoint
Most automation relies on brittle selectors or fixed APIs. When you need an agent that truly sees the screen and acts like a human, the /v1/predict endpoint is your core primitive. It takes a base64 screenshot and an instruction and returns a list of mouse/keyboard actions, billed at $0.05 per call.
How it works
Send a POST to https://coasty.ai/v1/predict with a base64-encoded screenshot, a text instruction, and a CUA version. The endpoint returns a JSON payload with a list of actions (click, type, scroll, etc.), a status (pending, done), and a session_id if you want stateful trajectory memory. Loop: capture screen → predict → execute actions until status is done. This is the foundation of any computer use agent.
curl -X POST https://coasty.ai/v1/predict \
-H "X-API-Key: $COASTY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"image": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8/5+hHgAHggJ/PchI7wAAAABJRU5ErkJggg==",
"instruction": "Click the blue button labeled Submit",
"cua_version": "v3"
}'Request fields
- ●image (string, base64) , the screenshot to analyze.
- ●instruction (string) , natural language describing the action.
- ●cua_version (string) , model version like "v3" or "v4".
- ●Optional: session_id , continue a stateful trajectory from /v1/sessions.
Response fields
- ●actions (array) , list of mouse/keyboard steps: click, type, scroll, etc.
- ●status (string) , "pending" while computing, "done" when complete.
- ●session_id (string) , ID for stateful trajectory memory if you use session-based flows.
Billed $0.05 per call. Loop until status is done to complete a task.
Where this beats brittle automation
Standard automation tools break when UI changes or when elements lack stable IDs. The computer use API understands the visual context, so it can click a button labeled with changing text or interact with dynamic dashboards. By using the /v1/predict endpoint, you build an agent that truly sees and acts like a human, not just a script that follows brittle selectors.
Start building your computer use agent with the /v1/predict endpoint. If you want stateful trajectory memory, create a session first with POST /v1/sessions. Get your API key at https://coasty.ai/developers and start turning screenshots into actions.