Engineering

Stateful Sessions vs Stateless Predict in the Computer Use API

Rachel Kim||6 min
Home

Most web automation tools rely on brittle selectors, IDs, classes, XPath, that break when a UI changes. The Coasty computer use API lets your agent see the screen and act like a human. Two core modes exist: stateless predict, where each screenshot is processed independently, and stateful sessions, where the server remembers the full trajectory. This guide shows how to use both and when to pick stateful sessions for complex workflows.

How it works

Stateless predict uses POST /v1/predict. It takes a base64 screenshot, an instruction, and the CUA version (v3 or v4). The server returns actions and a status. You loop: capture, predict, act until status is done. Stateless predict costs $0.05 per request. Stateful sessions start with POST /v1/sessions to create a session ID. Then POST /v1/sessions/{id}/predict adds the current screenshot and instruction, but the server stores the full history. This trajectory memory lets you ask follow‑up questions without re‑describing the initial context. The session predict step costs $0.04 per call. Both endpoints return actions (e.g., click, type) and a status field.

bash
curl --request POST https://coasty.ai/v1/sessions \
  -H "X-API-Key: $COASTY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "machine_id": "dev-vm-001",
    "cua_version": "v3"
  }'

# Example response
# {"id": "sess_12345", "status": "created"}

SESSION_ID="sess_12345"

# Stateless predict (without a session)
curl --request POST https://coasty.ai/v1/predict \
  -H "X-API-Key: $COASTY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "screenshot": "BASE64_SCREENSHOT",
    "instruction": "Click the login button",
    "cua_version": "v3"
  }'

# Stateful predict (with a session)
curl --request POST https://coasty.ai/v1/sessions/$SESSION_ID/predict \
  -H "X-API-Key: $COASTY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "screenshot": "BASE64_SCREENSHOT",
    "instruction": "Login with my credentials",
    "cua_version": "v3"
  }'

Key differences

  • Cost: stateless predict is $0.05 per request. Stateful predict is $0.04 per call (saves $0.01 per step).
  • Memory: stateless predict has no memory of previous steps. Stateful sessions store the full trajectory, so later instructions can reference earlier context.
  • Complexity: stateless fits simple, single‑step tasks. Stateful sessions handle multi‑step flows like account creation or multi‑page forms.
  • API surface: stateless uses POST /v1/predict. Stateful uses POST /v1/sessions, then POST /v1/sessions/{id}/predict for each step.

Use stateful sessions for any workflow with more than one step. The trajectory memory lets you ask follow‑up questions without re‑describing the initial context.

Where this beats brittle automation

Selector‑based tools rely on stable IDs and classes. When designers change a button’s class, your script breaks. The computer use API lets your agent see the visual screen and decide where to click. Stateful sessions mean you don’t have to re‑explain the whole state on each step, you can say "fill the email field" after seeing a login form. This is especially powerful for web apps that update often or for desktop tools with no stable APIs. Vision‑based control plus trajectory memory gives you a more robust automation layer than brittle selectors alone.

Start with stateless predict for quick, single‑step tasks. Migrate to stateful sessions for multi‑step workflows where you need context across steps. Build a computer use agent that sees, reasons, and acts like a human. Get your API key at https://coasty.ai/developers.

Want to see this in action?

View Case Studies
Try Coasty Free