Grounding UI Elements to Coordinates with /v1/ground
Many automation tools try to click by ID, class, or XPath. Those selectors break when the layout changes or styles update. The Coasty /v1/ground endpoint lets you describe a UI element visually and get back the exact x,y coordinates for a click or fill. This works on any real desktop, browser, or terminal, regardless of markup.
How /v1/ground works
You send a base64 screenshot, an element description, and a cua_version to POST /v1/ground. The service maps the description to screen coordinates and returns them in the response. The endpoint costs $0.03. Use it with /v1/predict for full computer use loops or with /v1/parse to turn pyautogui code into structured actions.
cat > ground_example.sh << 'EOF'
#!/usr/bin/env bash
set -euo pipefail
# Read API key from environment
COASTY_API_KEY="${COASTY_API_KEY:?COASTY_API_KEY must be set}"
# Base64 encode a screenshot (replace with your own base64 string)
SCREENSHOT_BASE64=$(base64 -i screenshot.png)
# Call /v1/ground with an element description
curl -s -X POST "https://coasty.ai/v1/ground" \
-H "Content-Type: application/json" \
-H "X-API-Key: $COASTY_API_KEY" \
-d '{
"screenshot": "'$SCREENSHOT_BASE64'",
"element_description": "the submit button with text Sign In",
"cua_version": "v3"
}' | jq '.'
EOF
chmod +x ground_example.sh
./ground_example.shKey fields and response
- ●screenshot: base64-encoded image from the desktop
- ●element_description: natural-language description of the UI element
- ●cua_version: required; use v3 for basic grounding, v4 for autonomous mode if paired with verifier
- ●Response includes x and y coordinates (screen space)
- ●Billed $0.03 per request
- ●Works with any UI, no markup or selector needed
POST /v1/ground maps a screenshot + element description to x,y coordinates for $0.03.
Where this beats brittle automation
Traditional tools rely on IDs, classes, or XPath that can shift on a rebrand or layout change. /v1/ground reads the actual visual state of the screen, so it clicks the right button even if the HTML changes. It also works on headless terminals and desktop apps that expose no DOM at all. Combine it with /v1/runs for full task automation or /v1/parse to translate pyautogui commands into Coasty actions.
You can now ground UI descriptions to coordinates and automate clicks anywhere a human can see. Use /v1/ground in your existing workflows, pair it with computer use agents, and build robust automation that survives UI changes. Get your API key at https://coasty.ai/developers and start grounding UI elements today.