Ground UI Elements to Coordinates with Coasty Computer Use API
Hardcoded selectors break when UI changes. XPaths and IDs drift. CSS selectors become stale. You need the agent to see the screen and act like a human. The Coasty Computer Use API gives you that through its vision-powered grounding endpoint. POST /v1/ground takes a screenshot and an element description and returns the precise (x, y) coordinate for clicking, hovering, or filling inputs. You pay only $0.03 per grounding call and avoid brittle selector maintenance.
How it works
Call POST https://coasty.ai/v1/ground. Include a base64-encoded screenshot and an element description in the request body. The endpoint returns a JSON response with a status and a data object containing the x and y coordinates for the target element. You then use those coordinates with your own automation stack or feed them back into the Coasty /v1/predict loop for fully autonomous computer use. The grounding step is stateless and free to call repeatedly as the UI changes.
curl -X POST https://coasty.ai/v1/ground \
-H "Authorization: Bearer $COASTY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"screenshot": "$(base64 -w0 screenshot.png)",
"description": "Log in button on the top right"
}'Request and response fields
- ●screenshot: a base64-encoded image string from a snapshot of the desktop.
- ●description: a natural language description of the target element (e.g. "submit button at the bottom of the form").
- ●status: indicates the grounding result (typically success).
- ●data.x: the x coordinate in pixels.
- ●data.y: the y coordinate in pixels.
- ●Price: $0.03 per grounding call.
- ●Authentication: Bearer token from COASTY_API_KEY environment variable or X-API-Key header.
The /v1/ground endpoint maps screenshots and element descriptions to (x,y) at $0.03 per call.
Where this beats brittle automation
Hardcoded XPaths and IDs break on layout shifts, localization, or design updates. CSS selectors require you to predict class names and nesting. Both approaches fail when the UI changes. The computer use API grounds descriptions directly to pixel coordinates derived from the actual screen state. This lets your agent click, hover, and inspect elements exactly as a user does. You can combine grounding with the /v1/predict loop to let the agent see the screen, ground the target, and click autonomously. This is a robust, visual-first approach to UI automation.
Next steps
- ●Integrate /v1/ground into your existing UI test suite to replace brittle selectors.
- ●Loop grounding calls inside a /v1/predict session to let the agent autonomously locate and click elements.
- ●Use the returned coordinates with pyautogui for local execution or stream them back into the Coasty agent for full computer use.
- ●Get a key at https://coasty.ai/developers and start grounding UI elements reliably.
Grounding UI elements to coordinates with /v1/ground gives you a visual, reliable way to locate elements on the screen. Stop fighting with selectors and start letting the computer use agent see and act like a human. Create stable UI automation at $0.03 per grounding call. Get your API key at https://coasty.ai/developers.