Tutorial

Grounding UI Elements to Coordinates with /v1/ground

Daniel Kim||5 min
F5

Many automation scripts break when UI changes or layout shifts. You cannot rely on CSS selectors or XPath when the app is a native desktop window or a headless browser. The /v1/ground endpoint lets you send a screenshot and a description of any UI element. It returns the exact X and Y coordinates you can pass to pyautogui or another control tool. This is the core of reliable computer use automation.

How /v1/ground works

The endpoint maps a base64 screenshot and a natural language description to screen coordinates. It costs $0.03 per request. You send a POST to https://coasty.ai/v1/ground with a JSON body that includes your image data and a text description. The response contains an array of matches, each with x, y, width, height, and confidence. Use these values to click, type, or hover on the correct element.

bash
curl https://coasty.ai/v1/ground \  -H 'X-API-Key: $COASTY_API_KEY' \  -H 'Content-Type: application/json' \  -d '{
    "image": "base64_encoded_screenshot",
    "description": "the blue submit button near the top right corner",
    "cua_version": "v3"
  }'

Request fields

  • image: base64-encoded bytes of the screenshot
  • description: free-form text describing the target UI element
  • cua_version: string, use v3 for this endpoint
  • no other fields are required

Response fields

  • matches: array of objects
  • x: integer, horizontal pixel coordinate
  • y: integer, vertical pixel coordinate
  • width: integer, element width in pixels
  • height: integer, element height in pixels
  • confidence: float between 0 and 1 indicating match quality

POST to /v1/ground with a base64 screenshot and description to get X,Y for any UI element.

Why grounding beats brittle selectors

Traditional automation relies on CSS classes, IDs, or XPath that may change with a single release. Layout shifts can break selectors entirely. The /v1/ground endpoint uses visual understanding, so it works on native windows, PDF viewers, and any app without a stable API. You can describe a button as 'the green save button on the left side of the dialog' and get the exact pixel coordinates. This makes your computer use agent robust across themes, localizations, and UI redesigns.

Grounding UI elements to coordinates is the foundation of reliable computer use automation. Start with a screenshot and description, then click with the coordinates you receive. Build workflows that handle layout changes and non-API tools. Get your API key at https://coasty.ai/developers and begin grounding your automation.

Want to see this in action?

View Case Studies
Try Coasty Free