Stateful Sessions vs Stateless Predict in the Computer Use API
Traditional automation builds brittle selectors that break when UI changes. The Coasty Computer Use API solves this by letting your agent see the screen and act like a human. You have two core patterns: a stateless predict loop and a stateful session. The stateless approach sends a screenshot and instruction to /v1/predict, gets actions, and repeats. The stateful approach creates a session with /v1/sessions, stores trajectory memory, and calls /v1/sessions/{id}/predict multiple times to keep context. This post explains the real differences, exact pricing, and a working code example for stateful sessions.
How it works
Stateless predict is a single HTTP request: POST /v1/predict with base64 screenshot, instruction, and cua_version. The response includes actions and a status. You loop capture, predict, act until status is done. Stateful sessions have three steps. First, POST /v1/sessions creates a session and returns an id plus an initial trajectory. Second, for each step, POST /v1/sessions/{id}/predict sends the screenshot, instruction, cua_version, and the full trajectory from previous steps. The server returns actions and an updated trajectory. Third, you act on the screen and send the new screenshot back. The trajectory grows with each step, giving the model long-term memory. Both patterns use the same cua_version field but differ in how context travels across requests.
import base64
import os
import requests
from io import BytesIO
from PIL import Image
API_KEY = os.getenv("COASTY_API_KEY")
BASE_URL = "https://coasty.ai/v1"
# 1. Create a stateful session
session_resp = requests.post(
f"{BASE_URL}/sessions",
headers={"X-API-Key": API_KEY},
json={"cua_version": "v3"},
)
session_resp.raise_for_status()
session_id = session_resp.json()["id"]
# 2. Loop until done
screenshot = load_screenshot() # returns base64 PNG
trajectory = None
cuaversion = "v3"
while True:
predict_resp = requests.post(
f"{BASE_URL}/sessions/{session_id}/predict",
headers={"X-API-Key": API_KEY},
json={
"screenshot": screenshot,
"instruction": "Click the first button and type hello",
"cua_version": cuaversion,
"trajectory": trajectory,
},
)
predict_resp.raise_for_status()
result = predict_resp.json()
actions = result["actions"]
trajectory = result["trajectory"]
status = result["status"]
act_on_actions(actions) # your implementation
if status == "done":
break
print(f"Session {session_id} completed.")
Cost comparison per step
- ●POST /v1/predict: $0.05 per request.
- ●POST /v1/sessions: $0.10 one-time per session creation.
- ●POST /v1/sessions/{id}/predict: $0.04 per predict call.
- ●Stateless predict is $0.05 per step.
- ●Stateful sessions cost $0.04 per step after creation, plus $0.10 for the session resource.
- ●Both patterns bill $0.05 per agent step when using Task Runs.
- ●Trajectory memory in sessions is included in predict cost, no extra charge per token.
Use POST /v1/sessions and POST /v1/sessions/{id}/predict for long-running tasks that need context across steps.
When each pattern makes sense
- ●Choose stateless predict for short, independent tasks like one-shot screenshot interpretation.
- ●Choose stateful sessions for multi-step workflows such as onboarding flows or complex desktop interactions.
- ●Stateless predict is simpler to implement but lacks memory across requests.
- ●Stateful sessions need more code to manage trajectory JSON but enable the agent to refer to earlier steps.
- ●If you use Task Runs with the server driving the agent, the same $0.05 per step cost applies regardless of session type.
Where this beats brittle automation
Computer use agents see the visual state of the desktop instead of relying on fragile CSS selectors. If a UI element changes location or ID, the model still maps the instruction to the correct action. Stateful sessions store the trajectory, so the model can reference previous clicks, inputs, and page states. This reduces the chance of context loss and makes your automation more robust across releases and test environments. Vision-based automation also handles edge cases like pop-ups, overlays, and dynamic content that break traditional frameworks.
Stateless predict and stateful sessions each have their place. For multi-step desktop automation, use POST /v1/sessions + POST /v1/sessions/{id}/predict to keep trajectory memory. Deploy with confidence knowing the exact pricing and field names. Get your API key at https://coasty.ai/developers and start building reliable computer use agents today.