Ground UI Elements to Coordinates with /v1/ground
Selectors break when a button class changes or a layout shifts. You want your computer use agent to see the screen and point to the right pixel. The /v1/ground endpoint turns a screenshot and a natural language description into an x,y coordinate pair. This turns vague instructions like 'click the save button' into precise pyautogui clicks.
How it works
Send a base64 screenshot and an element description to POST /v1/ground. The service returns an object with x and y coordinates for the first matching element. The endpoint costs $0.03 per request. Use this coordinate directly with pyautogui or feed it into your own loop of capture‑predict‑act for a full computer use agent.
curl -X POST https://coasty.ai/v1/ground \
-H "X-API-Key: $COASTY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"screenshot": "$(base64 -i screenshot.png | tr -d '\n')",
"description": "the Save button in the top right corner"
}'Request fields
- ●screenshot (string, base64): The full desktop screenshot.
- ●description (string): A natural language description of the element you want to click.
- ●Optional fields: None listed in authoritative docs.
Ground a description to coordinates with POST /v1/ground for $0.03 per request.
Where this beats brittle automation
Purely API‑driven tools rely on selectors like class="btn-primary" or XPath expressions. If the UI changes, your automation breaks. The ground endpoint lets your agent reason from visual context. It sees the button and points to it by pixel. This makes your computer use agent resilient to layout shifts, class renames, and even different themes or localized text. Combine it with the capture‑predict‑act loop to build agents that truly interact with real applications.
Start grounding UI elements to coordinates with the /v1/ground endpoint. Build robust agents that see and click like a human. Get an API key at https://coasty.ai/developers.