Ground UI Elements to Coordinates with /v1/ground
Most UI automation tools rely on brittle selectors like XPath or CSS classes. They break when a design changes. The Coasty Computer Use API approaches this differently. The /v1/ground endpoint takes a screenshot and an element description and returns the exact X,Y coordinates on that screen. You then use those coordinates with pyautogui or your own cursor logic. This gives you screen-aware automation that adapts to layout shifts. It costs $0.03 per request.
How /v1/ground works
You send a base64 encoded screenshot and a text description of the element you want to target. The server locates the element visually and returns its bounding box coordinates on the image. The response includes the top-left X and Y coordinates and the width and height of the bounding box. This data lets you click or move the mouse to the exact spot on the screen. /v1/ground is a separate call from the main predict loop. Use it when you need precise targeting based on visual context.
curl -X POST https://coasty.ai/v1/ground \
-H "X-API-Key: $COASTY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"screenshot": "<base64-encoded-screenshot>",
"element_description": "the blue submit button in the top right corner"
}'
Replace <base64-encoded-screenshot> with the base64 string of your screen capture. The response will look like this:
{
"x": 1240,
"y": 45,
"width": 120,
"height": 36,
"request_id": "abc123"
}When to call /v1/ground
- ●You have a screenshot and need precise coordinates for a UI element.
- ●Your automation needs to adapt to layout changes without updating selectors.
- ●You are building a cursor-based automation layer on top of pyautogui.
- ●You want to verify that an element is visible and locate it before acting.
The /v1/ground endpoint costs $0.03 per request and returns exact X,Y coordinates for the element on the provided screenshot.
Where this beats brittle automation
Traditional automation tools bind actions to selectors like XPath or class names. If a designer moves a button by ten pixels or changes a class name, your script fails. /v1/ground works on what the user sees. You describe what you want in plain language and the API finds the element visually. This makes your automation resilient to minor layout shifts and design updates. You still use the coordinates to click or interact, but the mapping is grounded in the actual screen appearance.
Combine /v1/ground with the main predict loop to build computer use agents that see and act like a human. Use the coordinates to trigger pyautogui commands or pass them into your own UI interaction layer. To get started, generate a key at https://coasty.ai/developers and test /v1/ground with your own screenshots.