Ground UI Elements to Coordinates with /v1/ground
Manual automation breaks the moment a button moves or a layout shifts. You can chase selectors forever, but a computer use agent sees the screen and acts. The /v1/ground endpoint turns a description of an element into click-ready X,Y coordinates. It costs $0.03 per call. You can plug those coordinates into any automation stack or combine them with the /v1/predict loop to let the agent ground its own actions on the current UI.
How /v1/ground works
POST https://coasty.ai/v1/ground. Set the X-API-Key header from the COASTY_API_KEY environment variable. The body expects a base64 screenshot, an element description, and optional bounding box hints. The endpoint returns coordinates as an object with x and y fields. You can use these directly for mouse clicks or pass them to a downstream automation tool. The price is $0.03 per successful request.
#!/usr/bin/env bash
API_KEY="${COASTY_API_KEY}"
URL="https://coasty.ai/v1/ground"
# Base64 encode a sample screenshot
SCREENSHOT=$(base64 -i screenshot.png)
curl -s "$URL" \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"screenshot": "'$SCREENSHOT'",
"description": "the blue submit button at the bottom center of the screen"
}' | jq .Request and response fields
- ●screenshot: base64-encoded image data for the current screen
- ●description: natural language description of the element you want to click
- ●bounding_box (optional): {x, y, width, height} to narrow the search
- ●x: integer pixel coordinate for the element center
- ●y: integer pixel coordinate for the element center
- ●price: $0.03 per call
POST /v1/ground with a screenshot and description, then use the returned x and y to click with pyautogui or another automation library.
Where /v1/ground beats brittle selectors
Traditional automation relies on CSS selectors, XPath, or IDs. When a front-end engineer reorders a DOM, adds a class, or uses a dynamic ID, your scripts break. A computer use agent sees the visual layout. /v1/ground lets you describe elements in plain language. The endpoint returns coordinates that are valid regardless of the underlying markup. You can ground a button, an input field, or any UI component without rewriting selectors. This works across desktop apps, web dashboards, and custom interfaces you control.
Combine /v1/ground with /v1/predict to let the agent ground its own actions on the fly. Build workflows with /v1/workflows that ground, click, and verify results. Get a key at https://coasty.ai/developers and start grounding UI elements with the computer use API.