Ground UI Elements to Coordinates with /v1/ground
Most browser and desktop automation relies on brittle selectors like CSS classes, IDs, or XPath. These break when a UI updates or a test runs on a different environment. The /v1/ground endpoint solves this by mapping a natural language description of a UI element to exact screen coordinates. You describe what you want to click or interact with and get a reliable x,y pair. This makes computer use agents more robust and easier to maintain.
How /v1/ground works
The /v1/ground endpoint accepts a base64 screenshot and a natural language description of an element and returns x and y coordinates. The endpoint costs $0.03 per request. It is free to call and does not require a session ID, making it ideal for ad-hoc element localization before running a full task run. The request body contains three fields. The screenshot field holds the base64-encoded image. The description field is a string that describes the target UI element in plain language. The cua_version field specifies the computer use agent version, with v3 as the default. The response includes the x and y coordinates and their pixel precision.
import os
import base64
import requests
API_KEY = os.getenv("COASTY_API_KEY")
BASE_URL = "https://coasty.ai/v1"
# Base64 encode a screenshot (replace with your actual image path)
with open("dashboard.png", "rb") as f:
screenshot_b64 = base64.b64encode(f.read()).decode("utf-8")
def ground_element(description: str):
url = f"{BASE_URL}/ground"
headers = {
"X-API-Key": API_KEY,
"Content-Type": "application/json"
}
payload = {
"screenshot": screenshot_b64,
"description": description,
"cua_version": "v3"
}
resp = requests.post(url, json=payload, headers=headers)
resp.raise_for_status()
data = resp.json()
return data.get("x"), data.get("y")
# Example: locate the 'Sign In' button on a dashboard
x, y = ground_element("The blue 'Sign In' button on the top right of the dashboard")
print(f"Grounded coordinates: x={x}, y={y}")Key fields and pricing
- ●Endpoint: POST /v1/ground
- ●Cost: $0.03 per request
- ●Request fields: screenshot (base64), description (string), cua_version (string). Default is v3.
- ●Response fields: x (integer), y (integer) in pixels.
- ●No session ID is required to call this endpoint.
Call POST /v1/ground with a screenshot and description to get x,y coordinates in $0.03 per request.
Where this beats brittle automation
Static selectors break when a class name changes or when UI layout shifts. /v1/ground uses visual understanding to locate elements based on their appearance and context. This lets you automate on any browser or desktop without maintaining a library of selectors. You can also switch environments and re-ground elements on the fly, reducing test flakiness. The endpoint works directly from a screenshot, so you can integrate it with any screenshot capture step in your workflow.
Start grounding UI elements with /v1/ground to build more reliable computer use agents. Get an API key at https://coasty.ai/developers and try the example above with your own screenshot.