Automating Form Filling and Checkout Flows Over the API
Checkout flows are fragile. A tiny layout change or new field breaks a pure-API scraper. You want an agent that sees the screen and clicks inputs, types text, and hits Pay exactly like a human does. The Coasty computer use API gives you that. You send a screenshot, a natural language instruction, and the model returns concrete actions. You loop capture, predict, act until the status is done.
How the computer use API works for forms
The predict endpoint drives a visual agent. It takes a base64 screenshot, an instruction, and a cua_version. It returns an array of actions and a status. You keep looping: capture screen → POST /v1/predict → send actions to the OS → repeat until status is done. Each predict call costs $0.05. The agent can click, type, hover, and scroll. It handles dynamic content and layout changes because it reads the actual UI.
import os
import base64
import requests
API_KEY = os.getenv("COASTY_API_KEY")
BASE_URL = "https://coasty.ai/v1"
# Capture a screenshot (replace with your own logic)
with open("checkout.png", "rb") as f:
screenshot_b64 = base64.b64encode(f.read()).decode("utf-8")
instruction = "Fill the checkout form. Email [email protected], password MySecurePass123, proceed to payment."
def predict(screenshot, instruction):
resp = requests.post(
f"{BASE_URL}/predict",
headers={"X-API-Key": API_KEY},
json={
"screenshot": screenshot,
"instruction": instruction,
"cua_version": "v3"
}
)
resp.raise_for_status()
return resp.json()
actions = []
status = None
while status != "done":
result = predict(screenshot_b64, instruction)
actions.extend(result.get("actions", []))
status = result.get("status")
# In a real loop you would send actions to the OS and capture a new screen
print("Status:", status)
print("Actions:", actions)Billed per agent step
- ●Each predict call is billed $0.05.
- ●You can limit steps with max_steps in a Task Run if you want a full workflow.
- ●The agent runs locally on a cloud VM you provision, so it can interact with real browsers and desktops.
Loop capture → predict → act until status is done.
Why computer use beats brittle selectors
Pure-API tools rely on stable CSS selectors, XPath, or API bindings. A new button or label breaks them. A computer use agent reads the live UI. It sees the email input's position and label, types the value, and clicks the button. It adapts to layout shifts, dynamic IDs, and missing bindings. You get resilience without maintaining brittle selectors. The predict endpoint is free, but you pay $0.05 per agent step when you use the cloud VM.
Start building a computer use agent that reads forms and checks out for you. Use the predict endpoint to turn screenshots into actions, then deploy a VM-driven agent with Task Runs for multi-step workflows. Get your API key at https://coasty.ai/developers.