Ground UI Elements to Coordinates with /v1/ground
Most automation tools break when the DOM changes or a library updates. You need a computer use agent that looks at the screen and understands what it sees. The /v1/ground endpoint maps a screenshot and an element description to exact screen coordinates so your agent can click, type, or hover anywhere.
How it works
You send a base64-encoded screenshot and a text description of the element you want to interact with. The endpoint returns an x and y coordinate pair that you can pass to pyautogui or another automation library. This prevents flaky selectors and lets your agent work on any OS or browser.
curl -X POST https://coasty.ai/v1/ground \
-H "X-API-Key: $COASTY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"screenshot": "$(base64 -i screenshot.png | tr -d "\n")",
"description": "The blue primary button in the top right"
}'Endpoint details
- ●Method: POST
- ●Base URL: https://coasty.ai/v1/ground
- ●Required headers: X-API-Key (read from COASTY_API_KEY env var) and Content-Type: application/json
- ●Request body fields: screenshot (base64 string) and description (string)
- ●Response fields: x (integer) and y (integer) coordinates
- ●Cost: $0.03 per request
- ●Rate limits: 429 error when over the limit
Grounding costs only 3 credits ($0.03) per element.
Why this beats brittle selectors
Traditional automation relies on CSS selectors, XPath, or unique IDs that often change between versions. If you update your product, you must rewrite your scripts. A computer use agent with grounding sees the screen and understands natural language descriptions, so it can adapt to layout changes without code modifications.
Build a UI agent
Use /v1/ground to locate elements, then send those coordinates to pyautogui or your own automation layer. You can combine grounding with /v1/predict for vision-based actions or /v1/runs for full task runs that drive the agent to completion. Start building at coasty.ai/developers to get your API key.