Tutorial

Ground UI Elements to Coordinates with /v1/ground

Daniel Kim||4 min
Ctrl+C

Most computer use agents rely on brittle selectors like CSS classes, IDs, or XPath. When a UI changes, the automation breaks. The /v1/ground endpoint solves this by taking a base64 screenshot, a human-readable element description, and returning precise x,y coordinates you can pass to pyautogui or your own action engine. This turns natural language into actionable positioning.

How /v1/ground works

The endpoint requires a base64 screenshot, an element description, and the cua_version. You send a POST request to https://coasty.ai/v1/ground with an X-API-Key header and JSON body. The response includes x and y coordinates mapping the described element to the screen. This is a $0.03 operation.

bash
curl https://coasty.ai/v1/ground \
  -H "X-API-Key: $COASTY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "screenshot": "$(base64 -i screenshot.png -w 0)",
    "description": "the blue submit button in the top right",
    "cua_version": "v3"
  }'

Why grounding improves reliability

  • You describe what you see, not how the DOM is structured.
  • The API runs a vision model that understands visual context and layout.
  • Coordinates are returned relative to the top-left of the screenshot, matching pyautogui expectations.
  • Each ground call costs $0.03, making it cheap to try multiple descriptions until you hit the right one.

Grounding is a matching layer between natural language and pixel-perfect actions.

Where this beats brittle automation

Traditional automation often breaks when a UI library updates a class name or swaps a container. Even well-crafted selectors can fail with dynamic content or inconsistent IDs. By grounding to coordinates derived from a vision model, your agent ignores structure and focuses on what it sees. This makes your computer use agent resilient to layout changes, theming, and minor framework updates. You still need a screenshot, but you can localize the request to a small region around the element to improve accuracy and speed.

Combine /v1/ground with /v1/predict or task runs to build agents that understand UI semantics, not just selectors. Get your API key at https://coasty.ai/developers and start grounding your automation to coordinates.

Want to see this in action?

View Case Studies
Try Coasty Free