Tutorial

Grounding UI Elements to Coordinates with /v1/ground

Emily Watson||6 min
Del

Clicking a button by selector works until the UI changes. Building a computer use agent means seeing the screen and acting on what you see. The /v1/ground endpoint turns a screenshot and an element description into exact x,y coordinates. You send a base64 screenshot, describe what you want, and get a point you can pass to pyautogui or your own click logic. This is the foundation for reliable computer use automation.

How it works

Send a POST request to https://coasty.ai/v1/ground with a base64 screenshot, the element description, and the CUA version. The endpoint returns a JSON object with x and y coordinates along with any status or metadata. You then use those coordinates to perform the action. Each grounding call costs $0.03. This low cost lets you ground multiple elements per task without breaking the budget.

bash
curl -X POST https://coasty.ai/v1/ground \
  -H "X-API-Key: $COASTY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "screenshot": "BASE64_IMAGE_HERE",
    "description": "The blue submit button in the center of the screen",
    "cua_version": "v3"
  }'

Request and response fields

  • screenshot: base64 encoded PNG image of the current view
  • description: natural language description of the target element
  • cua_version: string, default is v3, must match your agent version
  • x: integer, pixel coordinate from the left
  • y: integer, pixel coordinate from the top
  • price: $0.03 per call

Use the x and y from /v1/ground as the click coordinates for pyautogui.click(x, y).

Where this beats brittle automation

Static selectors break when classes change, IDs are auto-generated, or the layout shifts. A computer use agent that sees and grounds elements on every frame adapts automatically. You describe what you want in plain English, not CSS or XPath. The model locates the element, returns coordinates, and you act. This approach works across browsers, desktop apps, and terminals without maintaining brittle selector maps.

Combine grounding with /v1/runs

  • POST /v1/runs to let the server drive the agent and handle state
  • Use /v1/ground inside your own loop to ground critical steps
  • Each grounding call is $0.03, each task step is $0.05
  • Total cost stays predictable and low for complex workflows

Grounding UI elements to coordinates with /v1/ground gives you precise, human-like click targets without brittle selectors. Build a computer use agent that sees the screen, grounds what it needs, and acts reliably. Get a key at https://coasty.ai/developers and start grounding today.

Want to see this in action?

View Case Studies
Try Coasty Free