Ground UI Elements to Coordinates with /v1/ground
Most computer use agents rely on brittle selectors like CSS classes, IDs, or XPath. When a UI changes, the automation breaks. The /v1/ground endpoint solves this by taking a base64 screenshot, a human-readable element description, and returning precise x,y coordinates you can pass to pyautogui or your own action engine. This turns natural language into actionable positioning.
How /v1/ground works
The endpoint requires a base64 screenshot, an element description, and the cua_version. You send a POST request to https://coasty.ai/v1/ground with an X-API-Key header and JSON body. The response includes x and y coordinates mapping the described element to the screen. This is a $0.03 operation.
curl https://coasty.ai/v1/ground \
-H "X-API-Key: $COASTY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"screenshot": "$(base64 -i screenshot.png -w 0)",
"description": "the blue submit button in the top right",
"cua_version": "v3"
}'Why grounding improves reliability
- ●You describe what you see, not how the DOM is structured.
- ●The API runs a vision model that understands visual context and layout.
- ●Coordinates are returned relative to the top-left of the screenshot, matching pyautogui expectations.
- ●Each ground call costs $0.03, making it cheap to try multiple descriptions until you hit the right one.
Grounding is a matching layer between natural language and pixel-perfect actions.
Where this beats brittle automation
Traditional automation often breaks when a UI library updates a class name or swaps a container. Even well-crafted selectors can fail with dynamic content or inconsistent IDs. By grounding to coordinates derived from a vision model, your agent ignores structure and focuses on what it sees. This makes your computer use agent resilient to layout changes, theming, and minor framework updates. You still need a screenshot, but you can localize the request to a small region around the element to improve accuracy and speed.
Combine /v1/ground with /v1/predict or task runs to build agents that understand UI semantics, not just selectors. Get your API key at https://coasty.ai/developers and start grounding your automation to coordinates.