Tutorial

Ground UI Elements to Coordinates Using /v1/ground

Emily Watson||5 min
Alt+F4

You want your agent to click a button, fill a form, or interact with a specific element, but you only have a screenshot and a short description. You do not want to maintain fragile XPath or CSS selectors. The /v1/ground endpoint maps a screenshot and element description to precise x,y coordinates, letting you act on the screen like a human.

How it works

You send a POST request to https://coasty.ai/v1/ground with a base64 screenshot, an element description, and the CUA version you are using. The server returns an action object with x,y coordinates and a status. You use those coordinates for your click or other actions. The endpoint costs $0.03 per call.

bash
curl https://coasty.ai/v1/ground \ 
  -H "X-API-Key: $COASTY_API_KEY" \ 
  -H "Content-Type: application/json" \ 
  -d '{
    "screenshot": "iVBORw0KGgoAAAANSUhEUgAAAAgAAAAICAYAAADED76LAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAABl0RVh0U29mdHdhcmUAd3d3Lmlua3NjYXBlLm9yZ5vuPBoAAAANSURBVBisYXFBEAgDMiC8T9gFehgQM0hH5ANAAAAABJRU5ErkJggg==",
    "description": "a primary button labeled Submit",
    "cua_version": "v3"
  }'

Request and response details

  • Endpoint: POST /v1/ground
  • Base URL: https://coasty.ai/v1
  • Headers: X-API-Key (or Authorization: Bearer) and Content-Type: application/json
  • Body fields: screenshot (base64 string), description (string), cua_version (string, default v3)
  • Price: $0.03 per call
  • Response: JSON object with an action field (x, y coordinates) and status field
  • Error handling: 401 for invalid key, 429 for rate limits

Take a screenshot, call /v1/ground with a description, and use the returned x,y coordinates to click the element.

Where this beats brittle automation

Traditional automation relies on selectors like XPath or CSS that break when UI changes. The /v1/ground endpoint uses vision to locate elements based on what they look like and what they are described as, not on their structure. This means your agent can work across different browsers, desktop apps, and terminals where selectors do not exist. You can also handle dynamic content and UI updates without rewriting selectors.

Use /v1/ground to ground your computer use agent in the actual visual interface. Build agents that click, type, and navigate with confidence. Get your API key at https://coasty.ai/developers.

Want to see this in action?

View Case Studies
Try Coasty Free