Automating Form Filling and Checkout Flows Over the API
Form filling and checkout flows are a classic test case for automation. Every site uses different classes, IDs, or layout. Traditional tools break on a single field change. The Coasty computer use API drives a real desktop, sees the screen, and clicks as a human would. It costs $0.10 per session and $0.05 per agent step.
How it works
You start a session with POST /v1/sessions. The server returns a session_id. Then you repeatedly POST /v1/sessions/{id}/predict with a base64 screenshot, an instruction, and cua_version. The response includes actions and a status. Loop until status is done. The agent sees the UI, clicks, types, and submits. Billing is per step, not per API call.
import os
import requests
import base64
import io
import time
from PIL import Image
API_KEY = os.getenv("COASTY_API_KEY")
BASE_URL = "https://coasty.ai/v1"
# Helper to turn a Pillow image into base64
def img_to_base64(pil_image):
buffered = io.BytesIO()
pil_image.save(buffered, format="PNG")
img_bytes = buffered.getvalue()
return base64.b64encode(img_bytes).decode("utf-8")
# Capture screenshot for the first request
screen = Image.open("checkout_form.png")
b64_screen = img_to_base64(screen)
# Start a session
resp = requests.post(
f"{BASE_URL}/sessions",
headers={"X-API-Key": API_KEY},
json={"cua_version": "v3"},
)
resp.raise_for_status()
session_id = resp.json()["session_id"]
instruction = "Fill the first name field with 'Jane', the last name with 'Doe', the email with '[email protected]', the password with 'S3cur3!', and click the Place Order button."
# Main loop: predict, act, capture
while True:
resp = requests.post(
f"{BASE_URL}/sessions/{session_id}/predict",
headers={"X-API-Key": API_KEY},
json={
"screenshot": b64_screen,
"instruction": instruction,
"cua_version": "v3",
},
)
resp.raise_for_status()
body = resp.json()
actions = body.get("actions", [])
status = body.get("status")
if actions:
# In a real agent you would send actions to the desktop driver
# For this example we just print them
for act in actions:
print("Action:", act)
if status == "done":
print("Checkout flow completed.")
break
# Capture the next screen
screen = Image.open("checkout_form.png")
b64_screen = img_to_base64(screen)
time.sleep(1)Key fields and pricing
- ●POST /v1/sessions requires cua_version (default v3). Response includes session_id.
- ●POST /v1/sessions/{id}/predict takes screenshot (base64), instruction (string), cua_version (string). Response includes actions (list) and status (string).
- ●Status values: queued, running, awaiting_human, succeeded, failed, cancelled, timed_out. Loop until done.
- ●Billing: $0.10 per session start and $0.05 per agent step. No per-API-call charge.
Loop capture, predict, act until status is done.
Where this beats brittle automation
Traditional tools rely on CSS selectors, XPath, or fixed IDs. A single layout change or class rename breaks the script. The Coasty computer use API sees the screen, understands context, and adapts. It handles dynamic IDs, missing labels, or changed buttons. Because it drives a real desktop, it also works in iframes, popups, or any UI that isn’t fully accessible.
What to build next
Try this pattern on a multi-step checkout, a registration form with validation, or a login flow with MFA prompts. Add retry logic, checkpoints, or a workflow that runs multiple tasks in sequence. Get your API key at https://coasty.ai/developers to start automating forms and checkout flows with the Coasty computer use API.