Automating Form Filling and Checkout Flows Over the Computer Use API
Web forms and shopping carts change layout daily. CSS selectors break. API endpoints for checkout are often gated. To reliably fill forms and complete checkout, you need a solution that sees the page, understands context, and acts like a human. The Coasty Computer Use API lets you automate browsers and desktops by sending screenshots and instructions. It drives real UI, not just APIs.
How it works
The Computer Use API works in a loop. You capture a screenshot, send it to the model with an instruction, receive actions, and repeat until the task completes. Two core endpoints power this loop. POST /v1/predict sends a base64 screenshot plus instruction and cua_version. It returns actions and a status. If status is not "done", you capture again and predict again. This approach gives the model a full view of the interface. State can be managed with sessions or by driving a real machine. When you need precise element targeting, use POST /v1/ground to map a screenshot plus element description to x,y coordinates.
#!/bin/bash
# Set your API key from environment
export COASTY_API_KEY="${COASTY_API_KEY}"
# Base64 encode a screenshot (adjust file path)
SCREENSHOT_BASE64=$(base64 -w0 screenshot.png)
# Step 1 - Make a prediction request
PREDICT_RESPONSE=$(curl -s -X POST https://coasty.ai/v1/predict \
-H "Content-Type: application/json" \
-H "X-API-Key: ${COASTY_API_KEY}" \
-d '{
"screenshot": "'"${SCREENSHOT_BASE64}"'",
"instruction": "Find the email input field, enter [email protected], then find the password field and enter a password.",
"cua_version": "v3"
}')
# Extract status and actions
STATUS=$(echo $PREDICT_RESPONSE | jq -r '.status')
ACTIONS=$(echo $PREDICT_RESPONSE | jq -r '.actions')
# Repeat until done
while [ "$STATUS" != "done" ] && [ "$STATUS" != "failed" ]; do
# Use actions (click, type, etc.) to update the UI or continue
# In a real agent, you would send actions to the OS or browser
# For this example, we just loop the prediction
sleep 1
PREDICT_RESPONSE=$(curl -s -X POST https://coasty.ai/v1/predict \
-H "Content-Type: application/json" \
-H "X-API-Key: ${COASTY_API_KEY}" \
-d '{
"screenshot": "'"${SCREENSHOT_BASE64}"'",
"instruction": "Continue filling the form and proceed to checkout. Do not proceed if required fields are missing.",
"cua_version": "v3"
}')
STATUS=$(echo $PREDICT_RESPONSE | jq -r '.status')
ACTIONS=$(echo $PREDICT_RESPONSE | jq -r '.actions')
echo "Status: $STATUS, Actions: $ACTIONS"
done
echo "Task status: $STATUS"Key concepts and pricing
- ●POST /v1/predict costs $0.05 per request and returns actions plus a status.
- ●POST /v1/ground costs $0.03 and maps a screenshot and element description to x,y coordinates.
- ●POST /v1/predict loops until status is "done" or "failed".
- ●POST /v1/sessions ($0.10 per session) and /v1/sessions/{id}/predict ($0.04 per prediction) provide stateful trajectory memory.
- ●POST /v1/runs provisions a task run with machine_id, task, cua_version, instructions, system_prompt, max_steps, deadline_seconds, and an optional webhook_url.
- ●Task runs are billed $0.05 per agent step.
- ●POST /v1/machines provisions a cloud VM you can start, stop, and snapshot.
- ●Billing uses a prepaid USD wallet where 1 credit is $0.01.
- ●Webhooks are HMAC signed with header Coasty-Signature: t=unix,v1=hex.
- ●An Idempotency-Key header makes writes safe to retry.
- ●Scopes gate keys.
- ●An MCP server lets you drive Coasty from Cursor, Claude Desktop, or other MCP clients.
- ●Error code 401 means invalid key, 402 means insufficient credits, 403 means insufficient scope, and 429 means rate limit.
Loop capture, predict, act until status is "done".
Where this beats brittle automation
Standard web automation often relies on CSS selectors or XPath. If a site adds a new class, changes layout, or uses dynamic rendering, your scripts break. The Computer Use API replaces brittle selectors with a model that sees the entire screenshot and generates actions. It handles layout shifts, new element names, and missing ARIA labels. It can also interact with desktop apps, terminals, and windows beyond the browser. This approach is especially valuable for checkout flows where you must validate terms, payment forms, and final confirmations before submission. The API drives real UI, not a limited set of documented endpoints.
You can now build agents that see the screen, understand forms, and complete checkout flows. Start by capturing screenshots and sending predictions to POST /v1/predict. Add groundings for precise element targeting and use sessions or task runs for stateful workflows. Explore the MCP server to integrate Coasty into your existing IDE tools. Get your key at https://coasty.ai/developers and start building resilient automation.