Tutorial

Ground UI Elements to Coordinates with /v1/ground

Emily Watson||6 min
Alt+Tab

Most browser and desktop automation relies on brittle selectors like CSS classes, IDs, or XPath. These break when a UI updates or a test runs on a different environment. The /v1/ground endpoint solves this by mapping a natural language description of a UI element to exact screen coordinates. You describe what you want to click or interact with and get a reliable x,y pair. This makes computer use agents more robust and easier to maintain.

How /v1/ground works

The /v1/ground endpoint accepts a base64 screenshot and a natural language description of an element and returns x and y coordinates. The endpoint costs $0.03 per request. It is free to call and does not require a session ID, making it ideal for ad-hoc element localization before running a full task run. The request body contains three fields. The screenshot field holds the base64-encoded image. The description field is a string that describes the target UI element in plain language. The cua_version field specifies the computer use agent version, with v3 as the default. The response includes the x and y coordinates and their pixel precision.

python
import os
import base64
import requests

API_KEY = os.getenv("COASTY_API_KEY")
BASE_URL = "https://coasty.ai/v1"

# Base64 encode a screenshot (replace with your actual image path)
with open("dashboard.png", "rb") as f:
    screenshot_b64 = base64.b64encode(f.read()).decode("utf-8")

def ground_element(description: str):
    url = f"{BASE_URL}/ground"
    headers = {
        "X-API-Key": API_KEY,
        "Content-Type": "application/json"
    }
    payload = {
        "screenshot": screenshot_b64,
        "description": description,
        "cua_version": "v3"
    }
    resp = requests.post(url, json=payload, headers=headers)
    resp.raise_for_status()
    data = resp.json()
    return data.get("x"), data.get("y")

# Example: locate the 'Sign In' button on a dashboard
x, y = ground_element("The blue 'Sign In' button on the top right of the dashboard")
print(f"Grounded coordinates: x={x}, y={y}")

Key fields and pricing

  • Endpoint: POST /v1/ground
  • Cost: $0.03 per request
  • Request fields: screenshot (base64), description (string), cua_version (string). Default is v3.
  • Response fields: x (integer), y (integer) in pixels.
  • No session ID is required to call this endpoint.

Call POST /v1/ground with a screenshot and description to get x,y coordinates in $0.03 per request.

Where this beats brittle automation

Static selectors break when a class name changes or when UI layout shifts. /v1/ground uses visual understanding to locate elements based on their appearance and context. This lets you automate on any browser or desktop without maintaining a library of selectors. You can also switch environments and re-ground elements on the fly, reducing test flakiness. The endpoint works directly from a screenshot, so you can integrate it with any screenshot capture step in your workflow.

Start grounding UI elements with /v1/ground to build more reliable computer use agents. Get an API key at https://coasty.ai/developers and try the example above with your own screenshot.

Want to see this in action?

View Case Studies
Try Coasty Free