Guide

Ground UI Elements to Coordinates with /v1/ground

Daniel Kim||7 min
Ctrl+Z

Classic automation relies on selectors and XPath. They break when layout changes or the app uses dynamic IDs. The /v1/ground endpoint lets you describe what you want to click in plain language and get back the exact pixel coordinates. You pair a screenshot with a description and the API returns an x,y pair you can use with any automation tool.

How it works

POST /v1/ground takes a base64 screenshot and an element description. It returns a JSON object with an x and y coordinate. The endpoint costs $0.03 per call. You keep a fresh screenshot from each frame of your agent loop and ground each action. The result is a bridge between visual understanding and precise clicks.

bash
curl https://coasty.ai/v1/ground \ 
  -H "X-API-Key: $COASTY_API_KEY" \ 
  -H "Content-Type: application/json" \ 
  -d '{
    "screenshot": "base64_encoded_image",
    "description": "the blue submit button near the top right corner"
  }'

Request and response fields

  • Request body requires screenshot (base64 string) and description (string).
  • The API ignores extra top-level keys.
  • Response is a JSON object with x (integer, pixels) and y (integer, pixels).
  • If the element is not found the response may be null for both fields.

POST /v1/ground maps a screenshot and natural language description to x,y coordinates, billed at $0.03 per call.

Where this beats brittle automation

Selectors depend on stable class names, IDs, and CSS paths. Modern apps inject dynamic values and restructure DOM trees. Your test suite breaks every sprint. With /v1/ground you describe what you see: the button that says 'Save' next to the avatar. The API returns the exact pixel location regardless of DOM structure. You can then feed those coordinates to pyautogui, Playwright page.click, or any automation driver that accepts x,y. This lets you build computer use agents that reason about the visual state of an app and act on it directly.

Use /v1/ground to turn natural language into click targets. Build agents that see and act like humans. Get a key at https://coasty.ai/developers and start grounding your UI automation today.

Want to see this in action?

View Case Studies
Try Coasty Free