Tutorial

Build a Self-Running QA Testing Bot with the Computer Use API

Priya Patel||8 min
+B

QA teams still rely on brittle selectors and fragile APIs when they need to test real user flows. A self-running QA bot that drives a real browser, clicks elements, fills forms, and validates outcomes solves this problem. The Coasty Computer Use API lets you send a screenshot and an instruction, receive a list of actions, and loop until the task finishes. You can also run stateful sessions, ground elements to coordinates, and orchestrate multi-step workflows. This tutorial shows how to build a simple QA bot that logs in, navigates to a product page, and asserts the title using the real Computer Use API.

How it works

The Computer Use API lets you drive a desktop or browser session by sending screenshots and natural language instructions. For stateful tasks, you create a session first, then loop through predict calls that maintain a trajectory of previous actions. Each predict call returns a status, a list of actions, and optionally an error. The server bills $0.04 per predict call in a session and $0.05 per agent step in a task run. The API also provides a free endpoint to turn PyAutoGUI code into structured actions, which you can use to pre-process test scripts before sending them to the agent.

python
import os
import base64
import requests

API_KEY = os.getenv('COASTY_API_KEY')
BASE_URL = 'https://coasty.ai/v1'
HEADERS = {'X-API-Key': API_KEY}

def encode_image(path):
    with open(path, 'rb') as f:
        return base64.b64encode(f.read()).decode('utf-8')

def predict_action(image_b64, instruction, cua_version='v3'):
    payload = {
        'image': image_b64,
        'instruction': instruction,
        'cua_version': cua_version
    }
    resp = requests.post(f'{BASE_URL}/predict', json=payload, headers=HEADERS)
    resp.raise_for_status()
    data = resp.json()
    return data.get('actions', []), data.get('status')

# Example usage
screen_b64 = encode_image('login_page.png')
actions, status = predict_action(screen_b64, 'Click the login button and type my email', 'v3')
print('Actions:', actions)
print('Status:', status)

Set up a stateful session

  • POST /v1/sessions creates a new trajectory and returns a session ID.
  • The session ID persists between predict calls so the agent remembers previous actions.
  • You can pass a max_steps limit to avoid infinite loops in production.
  • Sessions cost $0.10 each and each predict inside costs $0.04.

Use sessions for multi-step tests so the agent remembers context across clicks and navigations.

Run a full QA bot with task runs

For longer workflows, use task runs. POST /v1/runs accepts a machine_id, task, cua_version, optional instructions, system_prompt, max_steps, deadline_seconds, on_awaiting_human, and webhook_url. The server bills $0.05 per agent step. You can monitor status with GET /v1/runs and GET /v1/runs/{id}, cancel with POST /v1/runs/{id}/cancel, and stream events with GET /v1/runs/{id}/events using Last-Event-ID for reconnection. States include queued, running, awaiting_human, succeeded, failed, cancelled, and timed_out.

bash
curl -X POST https://coasty.ai/v1/runs \
  -H "X-API-Key: $COASTY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "machine_id": "vm-12345",
    "task": "Navigate to https://example.com, click the pricing link, and verify the title contains Pricing",
    "cua_version": "v3",
    "max_steps": 20,
    "deadline_seconds": 300,
    "on_awaiting_human": "pause"
  }'

Where this beats brittle automation

Traditional QA tools rely on CSS selectors, XPath, or hardcoded API endpoints that break when a class name changes or a field order shifts. The Computer Use API drives a real browser or desktop, so the bot interacts with elements exactly like a human. By sending screenshots and natural language instructions, you can handle dynamic layouts, complex interactions, and multi-step user journeys without rewriting selectors. The API also supports workflows with conditionals, loops, retries, and human approval, letting you orchestrate robust test suites that adapt to changing UI.

You now have the building blocks to create a self-running QA testing bot with the Coasty Computer Use API. Use sessions for multi-step tests and task runs for complete workflows. Start by provisioning a machine, then send screenshots and instructions to drive a real browser. The API gives you human-like actions without brittle selectors. Get your key at https://coasty.ai/developers and start building your QA automation pipeline.

Want to see this in action?

View Case Studies
Try Coasty Free