Automating Form Filling and Checkout Flows Over the Computer Use API
Checkout flows break brittle automation. CSS selectors change, page layouts shift, and dynamic elements appear. A computer use agent that sees the screen and acts like a human handles all of that. The Coasty Computer Use API gives you that vision. You send a screenshot, an instruction, and the version of the computer use agent. You get back actions to click, type, and scroll. You loop capture, predict, act until the status is "done". This post shows how to build the loop for form filling and checkout flows.
How it works
The core flow uses POST /v1/predict. The request requires a base64 screenshot, an instruction, and the cua_version. The response returns actions and a status. You repeat the loop until the status is "done". For stateful sessions, use POST /v1/sessions to create an ID, then POST /v1/sessions/{id}/predict with the same required fields. The session stores trajectory memory so the model remembers context across steps. You can also use POST /v1/runs to have the server drive an agent to completion. POST /v1/runs takes a machine_id, task, cua_version, optional instructions, system_prompt, max_steps, deadline_seconds, on_awaiting_human, and webhook_url. The server bills $0.05 per agent step for task runs.
#!/bin/bash
COASTY_API_KEY="${COASTY_API_KEY}"
# Base64 encode the screenshot
SCREENSHOT=$(base64 -i screenshot.png)
# Build the request body
PAYLOAD=$(cat <<EOF
{
"screenshot": "$SCREENSHOT",
"instruction": "Fill the email input, fill the password input, click the sign-in button, and confirm the order summary.",
"cua_version": "v3"
}
EOF)
# Call the endpoint
curl -s https://coasty.ai/v1/predict \
-H "X-API-Key: $COASTY_API_KEY" \
-H "Content-Type: application/json" \
-d "$PAYLOAD" | jq
Stateful sessions for multi-step flows
Use POST /v1/sessions to create a session. The response contains an id. Send that id to POST /v1/sessions/{id}/predict. The model remembers the trajectory across steps, which helps with complex forms and conditional logic. You can also use POST /v1/runs to have the server drive the agent for you, which is useful when you want a pass/fail verifier and automatic retries.
- ●Read the COASTY_API_KEY from the environment, never hardcode it.
- ●Capture the screen with your tool of choice (Puppeteer, Playwright, pyautogui), encode to base64.
- ●Send the screenshot and instruction to POST /v1/predict or POST /v1/sessions/{id}/predict.
- ●Decode the actions from the response and execute them (click, type, scroll).
- ●Check the status field; when it is "done" you are finished. If it is "running", capture again and send the next loop.
- ●For long flows, use POST /v1/runs with a machine_id and a task describing the checkout. The server will drive the agent for you.
Loop capture, predict, act until status is "done", and use stateful sessions for multi-step flows.
Where this beats brittle automation
CSS selectors break when a page redesigns or a new field appears. API-only tools rely on stable endpoints that do not always exist. A computer use agent sees the screen. It can click the email input even if the ID changes. It can scroll to reveal hidden elements. It adapts to layout shifts and missing fields. You pay only for agent steps, not for long-running hidden waits. The vision layer makes your automation resilient and human-like.
You can now build agents that fill forms and complete checkout flows with the Coasty Computer Use API. Use POST /v1/predict or POST /v1/runs to start. For stateful memory, use POST /v1/sessions. For versioned workflows, explore POST /v1/workflows. Get your API key at https://coasty.ai/developers and start building robust checkout automation.