From Prototype to Production with the Coasty Computer Use API
Writing a GUI automation script that breaks on the next UI change is a common problem. You spend more time patching selectors than building new features. The Coasty computer use API solves this by letting your AI agent see the screen and act like a human. It drives real desktops, browsers, and terminals, not just API calls. You can go from a quick prototype to a production-grade agent in a few days.
How the Computer Use API works
Coasty operates in two modes: stateless predict and stateful sessions. Both use a vision model that receives a screenshot plus natural language instructions and returns actions. You loop capture, predict, act until the server reports status is "done". The computer use agent uses the cua_version field to pick the right model. Stateful sessions maintain trajectory memory across steps, making long workflows more reliable.
curl -X POST https://coasty.ai/v1/predict \
-H "Authorization: Bearer $COASTY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"screenshot": "$(base64 -w 0 screenshot.png)",
"instruction": "Click the Save button",
"cua_version": "v3"
}'
# Expected response (excerpt):
# {
# "actions": [{"type": "click", "x": 512, "y": 768}],
# "status": "done"
# }Stateful trajectory memory
For multi-step workflows, start a session to keep a trajectory of past actions. Create the session at $0.10, then call predict on that ID at $0.04 per step. The server stores the history, so the model can reference previous steps, reducing retries. Use this pattern for checkout flows, data entry, or any sequence of UI actions that must stay coherent.
Task Runs for end-to-end automation
When you want the server to drive an agent to completion, use Task Runs. POST /v1/runs with a task prompt, cloud machine_id, optional instructions, and cua_version (v4 is autonomous with a pass/fail verifier). Each agent step costs $0.05. The server handles state, retries, and state transitions. States include queued, running, awaiting_human, succeeded, failed, cancelled, timed_out. You can poll the run or stream events to track progress.
The key to production is stateful sessions plus Task Runs. Use sessions for long, coherent workflows and Task Runs when you want the server to drive the agent autonomously.
Where this beats brittle automation
Traditional UI automation relies on stable selectors like IDs or XPath. When a designer changes a class name or a layout shifts, your script breaks. Computer use agents see the screen, understand visual context, and pick the right element by description. They handle UI changes, dynamic content, and multi-step flows without rewriting selectors. You can also use the free /v1/parse endpoint to translate raw actions into structured sequences or generate test scripts.
You can now prototype a computer use agent in a single script and scale it to production workflows. Use stateful sessions for multi-step tasks and Task Runs for autonomous completion. The Coasty computer use API handles the heavy lifting so you can focus on business logic. Get your API key at https://coasty.ai/developers and start building agents that see and act like humans.