Tutorial

Ground UI Elements to Coordinates with /v1/ground

David Park||4 min
Ctrl+S

Manual automation breaks the moment a button moves or a layout shifts. You can chase selectors forever, but a computer use agent sees the screen and acts. The /v1/ground endpoint turns a description of an element into click-ready X,Y coordinates. It costs $0.03 per call. You can plug those coordinates into any automation stack or combine them with the /v1/predict loop to let the agent ground its own actions on the current UI.

How /v1/ground works

POST https://coasty.ai/v1/ground. Set the X-API-Key header from the COASTY_API_KEY environment variable. The body expects a base64 screenshot, an element description, and optional bounding box hints. The endpoint returns coordinates as an object with x and y fields. You can use these directly for mouse clicks or pass them to a downstream automation tool. The price is $0.03 per successful request.

bash
#!/usr/bin/env bash

API_KEY="${COASTY_API_KEY}"
URL="https://coasty.ai/v1/ground"

# Base64 encode a sample screenshot
SCREENSHOT=$(base64 -i screenshot.png)

curl -s "$URL" \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "screenshot": "'$SCREENSHOT'",
    "description": "the blue submit button at the bottom center of the screen"
  }' | jq .

Request and response fields

  • screenshot: base64-encoded image data for the current screen
  • description: natural language description of the element you want to click
  • bounding_box (optional): {x, y, width, height} to narrow the search
  • x: integer pixel coordinate for the element center
  • y: integer pixel coordinate for the element center
  • price: $0.03 per call

POST /v1/ground with a screenshot and description, then use the returned x and y to click with pyautogui or another automation library.

Where /v1/ground beats brittle selectors

Traditional automation relies on CSS selectors, XPath, or IDs. When a front-end engineer reorders a DOM, adds a class, or uses a dynamic ID, your scripts break. A computer use agent sees the visual layout. /v1/ground lets you describe elements in plain language. The endpoint returns coordinates that are valid regardless of the underlying markup. You can ground a button, an input field, or any UI component without rewriting selectors. This works across desktop apps, web dashboards, and custom interfaces you control.

Combine /v1/ground with /v1/predict to let the agent ground its own actions on the fly. Build workflows with /v1/workflows that ground, click, and verify results. Get a key at https://coasty.ai/developers and start grounding UI elements with the computer use API.

Want to see this in action?

View Case Studies
Try Coasty Free