Tutorial

Ground UI Elements to Coordinates with the Coasty /v1/ground Endpoint

Alex Thompson||5 min
Esc

Most computer use agents rely on static selectors or brittle API wrappers. They break when a UI changes or when you need to act on something that only exists visually. The /v1/ground endpoint solves this. You send a screenshot and a description of the element you want to act on. The API returns the exact x,y coordinates you can pass to your automation layer. This lets your computer use agent click buttons, fill inputs, or verify UI state based on what it sees.

How /v1/ground works

The endpoint is POST /v1/ground. It costs $0.03 per request. You send a JSON payload with a base64-encoded screenshot and a text description of the element. The server analyzes the screenshot and returns a bounding box with pixel coordinates. This grounding step sits before you feed actions into your agent loop. You can verify or adjust coordinates manually before committing to a click or keypress.

bash
#!/bin/bash

# Set your key from the environment, never hardcode it.
COASTY_API_KEY="$(echo $COASTY_API_KEY)"

# A small screenshot as base64 (replace with your actual image)
SCREENSHOT_BASE64=$(base64 -i screenshot.png)

# Call /v1/ground
curl -s https://coasty.ai/v1/ground \
  -H "X-API-Key: $COASTY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "screenshot": "'$SCREENSHOT_BASE64'",
    "description": "Sign in button with text Sign in located in the top right corner"
  }' | jq  # Requires jq for pretty printing

Grounding before action

  • POST /v1/ground costs $0.03 per request
  • Request body includes screenshot (base64) and description (string)
  • Response contains x, y, width, and height of the element's bounding box
  • You can feed these coordinates into your automation layer or pass them to /v1/sessions/{id}/predict for a stateful trajectory

Ground once with /v1/ground, then reuse the coordinates across multiple steps and sessions.

Where this beats brittle automation

Traditional automation assumes stable IDs or class names that evolve with every release. A change in layout or a new UI variant breaks your selectors. The computer use API with grounding works on what you actually see. You describe the element in human terms, "Submit" button in the center of the screen, and the model returns pixel-perfect coordinates. This makes your agent resilient to UI churn and lets it operate on systems that expose no programmatic APIs.

Start grounding your UI interactions and build agents that see and click like a human. Get an API key at https://coasty.ai/developers and try the endpoint today.

Want to see this in action?

View Case Studies
Try Coasty Free