Tutorial

Ground UI Elements to Coordinates with /v1/ground

Name: Coasty AI Employee
Brand: Coasty
Price: 19 USD
Availability: InStock
Rating: 4.8 (1250 reviews)

James Liu|July 10, 2026|7 min

⇧+Tab

Building stable automation around web and desktop UIs is hard. CSS selectors break when apps change classes. XPath queries get fragile with nested dynamic markup. You end up rewriting selectors every release. The /v1/ground endpoint solves this by turning a screenshot plus a natural language description into precise x,y coordinates. The agent sees the screen like a human sees it and returns the exact position of the element it needs to interact with.

How /v1/ground works

You send a screenshot as base64 and describe the UI element you want to locate. The endpoint returns the top-left x and y coordinates of that element on the current viewport. This is useful for tasks like clicking a button, filling a field, or hovering over a menu item. The call costs $0.03 per request. You can combine it with the /v1/predict and /v1/sessions pipelines to build a full computer use agent that reasons about the state of the screen and then acts on grounded coordinates.

bash

curl -X POST https://coasty.ai/v1/ground \ 
  -H "X-API-Key: $COASTY_API_KEY" \ 
  -H "Content-Type: application/json" \ 
  -d '{\n  "screenshot": "$(base64 -w 0 screenshot.png)",\n  "description": "The blue login button in the top right corner"\n}'

Request and response fields

●screenshot: base64-encoded image of the current viewport
●description: natural language description of the target element
●x: integer, pixel X coordinate of the element's top-left corner
●y: integer, pixel Y coordinate of the element's top-left corner
●price: $0.03 per call to /v1/ground

POST /v1/ground with a base64 screenshot and element description returns the pixel-perfect x,y coordinates that pyautogui.click(x, y) can use.

Where this beats brittle automation

Traditional automation relies on selectors that assume a fixed DOM structure. When a framework changes class names or reorders elements, your scripts break. With /v1/ground the agent looks at the current visual state of the UI. It does not depend on class names, IDs, or selectors. It just needs to recognize the element by how it looks and where it is on the screen. This makes your computer use agent resilient to UI changes and works across different browsers and applications.

Ground UI elements with /v1/ground to create reliable computer use agents that click, type, and navigate any interface. Build workflows, workflows, and task runs with the Coasty Computer Use API. Get your API key at https://coasty.ai/developers and start grounding your automation today.

Ground UI Elements to Coordinates with /v1/ground

How /v1/ground works

Request and response fields

Where this beats brittle automation

Compare Coasty

Computer Use For

Explore Coasty