Tutorial

Automate Form Filling and Checkout Flows With The Computer Use API

Sophia Martinez||8 min
Ctrl+R

Forms and checkout pages are the hardest part of e-commerce automation. They change layouts, present CAPTCHAs, and rely on dynamic IDs. Traditional automation tools break after one release. The Coasty computer use API solves this by letting an agent see the screen and act like a human. It opens a real browser, clicks buttons, types text, waits for dynamic elements, and handles errors. This guide shows you how to automate a form-filling and checkout flow with the real endpoints and pricing from the API docs.

How it works

The computer use API runs a stateful session on a real desktop or browser. You start a session, then loop through capture, predict, and act until the status is done. The /v1/sessions endpoint creates a session with a machine_id and returns a session_id. The /v1/sessions/{id}/predict endpoint takes a screenshot, instruction, and cua_version and returns actions. Each predict call costs $0.04. When the agent finishes, the session status becomes succeeded. If it encounters an error, the status becomes failed or awaiting_human depending on your configuration.

python
import base64
import os
import requests
import time

def encode_image(path):
    with open(path, 'rb') as f:
        return base64.b64encode(f.read()).decode('utf-8')

def predict(session_id, screenshot_b64, instruction, cua_version):
    url = f'https://coasty.ai/v1/sessions/{session_id}/predict'
    key = os.getenv('COASTY_API_KEY')
    resp = requests.post(
        url,
        headers={'X-API-Key': key},
        json={
            'screenshot': screenshot_b64,
            'instruction': instruction,
            'cua_version': cua_version
        }
    )
    resp.raise_for_status()
    return resp.json()

def main():
    # 1️⃣ Start a session on a cloud machine
    url = 'https://coasty.ai/v1/sessions'
    key = os.getenv('COASTY_API_KEY')
    resp = requests.post(
        url,
        headers={'X-API-Key': key},
        json={
            'machine_id': 'machine-123',
            'cua_version': 'v3'
        }
    )
    resp.raise_for_status()
    session = resp.json()
    session_id = session['session_id']
    print('Session created', session_id)

    # 2️⃣ Capture and predict loop
    screenshot_path = 'checkout.png'
    cua_version = 'v3'

    for step in range(10):
        screenshot_b64 = encode_image(screenshot_path)
        instruction = (
            'Fill the checkout form with [email protected], '
            'password=SecurePass123, and address=123 Main St, City, Zip. '
            'Click the Pay button when ready.'
        )
        result = predict(session_id, screenshot_b64, instruction, cua_version)
        actions = result.get('actions', [])
        status = result.get('status')
        print(f'Step {step} status={status} actions={actions}')
        if status == 'done':
            break
        time.sleep(1)

if __name__ == '__main__':
    main()

Key fields and pricing

  • POST /v1/sessions requires machine_id and cua_version, returns session_id.
  • POST /v1/sessions/{id}/predict takes screenshot (base64), instruction, and cua_version, returns actions and status.
  • Each predict call is $0.04.
  • Session state is tracked by the server; status values include queued, running, awaiting_human, succeeded, failed, cancelled, timed_out.
  • You can cancel or resume a run with POST /v1/runs/{id}/cancel and POST /v1/runs/{id}/resume for Task Runs.
  • Workflow DSL (POST /v1/workflows) can orchestrate multiple steps including assert, if, loop, and parallel blocks.

Create a session with POST /v1/sessions, then loop POST /v1/sessions/{id}/predict until status is done.

Where this beats brittle automation

Traditional tools rely on XPath, CSS selectors, and static IDs. When a retailer changes the class name or hides an element, the script fails. The computer use agent sees the screen just like a human. It can read labels, captions, and dynamic IDs. It can click buttons whose text changes per region or language. It can wait for a CAPTCHA to load and pause for human input. It can recover from layout shifts by re-scanning the page. This makes checkout flows robust across releases and regional variants.

What to build next

Take the pattern above and extend it into a full e-commerce bot. Add retry loops for failed states, handle CAPTCHA prompts with your own service, and wire it to a webhook to push orders to your backend. Use workflows to orchestrate multi-step flows like account creation, product search, add to cart, checkout, and order confirmation. For autonomous runs, use cua_version 'v4' with a pass/fail verifier and set on_awaiting_human to pause to let humans resolve CAPTCHAs.

You now have a working pattern for form-filling and checkout automation with the Coasty computer use API. The session-based predict loop lets you drive real browsers and desktops, handle dynamic pages, and pay only for steps you actually take. Start building at https://coasty.ai/developers to get your key and see the full endpoint documentation.

Want to see this in action?

View Case Studies
Try Coasty Free