v3 vs v4: choosing a computer use model on the Coasty API
You want an agent that can see the screen and act like a human. The Coasty Computer Use API lets you drive desktops, browsers, and terminals using only screenshots and natural language. The cua_version parameter decides how much control you keep and how much the server handles. v3 is a fully supervised mode where you control every step. v4 is autonomous with a built-in pass/fail verifier that can cancel a run if the task fails. Use the right version to match your cost model and reliability needs.
How it works
You send a POST /v1/runs request with machine_id, task, and cua_version. The body includes optional fields like instructions, system_prompt, max_steps, deadline_seconds, on_awaiting_human, and webhook_url. v3 (default) expects you to stream events and act on each prediction. v4 runs autonomously, checks a pass/fail verifier, and ends with a status of succeeded or failed. You pay $0.05 per agent step in both modes. The server returns events and final status so you know whether the task succeeded.
curl -X POST https://coasty.ai/v1/runs \
-H "X-API-Key: $COASTY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"machine_id": "vm-12345",
"task": "Open Chrome, navigate to coasty.ai, and verify the title",
"cua_version": "v4",
"max_steps": 50,
"deadline_seconds": 120,
"on_awaiting_human": "pause"
}'
v3 vs v4: key differences
- ●cua_version defaults to v3, which is a fully supervised mode.
- ●v3 gives you full control over every step through the /v1/runs/{id}/events stream.
- ●v4 is autonomous with a built-in pass/fail verifier that can cancel a run if the task fails.
- ●Both modes charge $0.05 per agent step for task runs.
- ●v4 can automatically succeed or fail a run based on the verifier, reducing the need for custom post-processing.
- ●v3 gives you more flexibility for complex workflows where you need to handle edge cases manually.
- ●Choose v3 for maximum control and debugging. Choose v4 for faster shipping and lower operational overhead.
cua_version = "v4" gives you autonomous verification. Use v3 if you need full supervision.
Where this beats brittle automation
Traditional automation relies on brittle selectors and API mocks. When UI changes, your scripts break. The Coasty computer use API takes screenshots and uses a vision model to understand the screen. The agent clicks, types, and scrolls like a human. This approach works across apps, browsers, and terminals without hard-coded selectors. You only need to describe the task in natural language. The server handles the details, so you can focus on the business logic.
Start with v3 for full control and iterate to v4 when you want autonomous verification. Build agents that see and act like humans. Get your API key at https://coasty.ai/developers and start building today.