Build a Self-Running QA Testing Bot with the Computer Use API
Manual QA is slow and error prone. Traditional automation relies on brittle selectors and fixed URLs. You want a bot that sees the screen, reads the text, and clicks the right button. The computer use API gives you that. It includes vision to read the UI and actions to click, type, and scroll. You only need to send a screenshot and an instruction. The API returns the next action. Repeat until the task is done.
How it works
You start by capturing a screenshot of the screen you want to test. Send it as a base64 string to POST /v1/predict. Include the instruction you want the agent to follow and the cua_version. The API returns a list of actions and a status. The status will be running or done. Keep the loop alive. Capture a new screenshot and call /v1/predict again until the status is done. Each call costs $0.05.
#!/usr/bin/env bash
# Build a QA bot that clicks a "Submit" button in a web app
# This example runs on macOS. Adapt for Linux or Windows.
set -euo pipefail
API_KEY="${COASTY_API_KEY:?Set COASTY_API_KEY env var}"
BASE_URL="https://coasty.ai/v1"
# Capture a screenshot of the full screen
SCREENSHOT="/tmp/screenshot.png"
# macOS example; on Linux use scrot, on Windows use PowerShell
if command -v screencapture &>/dev/null; then
screencapture -x "$SCREENSHOT"
else
echo "Install screencapture or adjust for your OS"
exit 1
fi
# Read screenshot as base64
SCREENSHOT_BASE64=$(base64 -i "$SCREENSHOT" | tr -d '\n')
# Prompt the agent to click the submit button
task="Click the Submit button and wait for the page to finish loading."
# Loop until the task is done
while true; do
response=$(curl -s -X POST "$BASE_URL/predict" \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"screenshot\": \"$SCREENSHOT_BASE64\",
\"instruction\": \"$task\",
\"cua_version\": \"v3\"
}")
echo "$response" | jq '.'
status=$(echo "$response" | jq -r '.status')
# Extract the first action (click) and execute it
if [[ "$status" == "done" ]]; then
break
fi
# Extract the first action and run it (shell example)
action_type=$(echo "$response" | jq -r '.actions[0].type')
x=$(echo "$response" | jq -r '.actions[0].x // empty')
y=$(echo "$response" | jq -r '.actions[0].y // empty')
text=$(echo "$response" | jq -r '.actions[0].text // empty')
if [[ "$action_type" == "click" ]]; then
# macOS example; adjust for your OS
osascript -e "tell application \"System Events\" to click at {$x,$y}"
elif [[ "$action_type" == "type" ]]; then
osascript -e "tell application \"System Events\" to keystroke \"$text\""
fi
# Capture a fresh screenshot for the next loop iteration
screencapture -x "$SCREENSHOT"
SCREENSHOT_BASE64=$(base64 -i "$SCREENSHOT" | tr -d '\n')
done
echo "Task completed. Status: $status"Using sessions for long-running tests
- ●A session stores trajectory memory across calls. Start a session with POST /v1/sessions. You get a session ID.
- ●Each iteration calls POST /v1/sessions/{id}/predict. Pass the screenshot, instruction, and cua_version. This API call costs $0.04.
- ●The server returns actions and an updated status. Keep looping until the status is done.
Each predict call is billed at $0.04 when using a session. This saves 20 percent compared to the stateless /v1/predict endpoint.
Why this beats brittle automation
Traditional automation relies on CSS selectors or XPath that break when the UI changes. You need to update selectors often. The computer use API reads the screen just like a human. It sees text, buttons, and layout. It clicks the element you describe in plain language. This means your tests stay valid even when the UI changes. No brittle selectors. No fragile locators. Just a description of what you want to test.
Scaling to autonomous QA runs
- ●Use POST /v1/runs to let the server drive an agent to completion on a cloud VM. You provide machine_id, task, cua_version, and optional instructions.
- ●The server runs until it reaches max_steps or deadline_seconds. Each agent step costs $0.05.
- ●You can cancel or resume a run with POST /v1/runs/{id}/cancel and POST /v1/runs/{id}/resume.
- ●Stream events with GET /v1/runs/{id}/events to see progress in real time.
You now have a blueprint for a self-running QA bot that uses vision and actions. Start with the simple loop over /v1/predict or /v1/sessions/{id}/predict. Gradually move to full autonomous runs with /v1/runs. Get your API key at https://coasty.ai/developers and start building.