Tutorial

Turn PyAutoGUI Code Into Structured Actions with the Free Parse Endpoint

James Liu||7 min
+N

Most desktop automation starts with PyAutoGUI. You write click(x, y), press('Tab'), and hotkey('ctrl', 'c'). This works for a fixed window size, but it breaks when layouts change or you move to a different machine. You end up scraping coordinates, maintaining selectors, and rewriting scripts for every version. The parse endpoint solves this by turning raw PyAutoGUI code into structured actions that the Coasty computer use API can execute on any screen. You get reusable, stateless instructions without maintaining coordinate mappings.

How /v1/parse works

The endpoint is free and takes a single text field: the PyAutoGUI source code. You POST to https://coasty.ai/v1/parse with an Authorization header using your API key. The server parses the code, detects click, move, drag, scroll, and keypress operations, and returns a JSON array of structured action objects. Each action includes fields like type, x, y, button, key, modifiers, and duration. This array can be fed directly to stateless predictive steps or stored as a reusable workflow step.

bash
curl https://coasty.ai/v1/parse \ 
  -H "Authorization: Bearer $COASTY_API_KEY" \ 
  -H "Content-Type: text/plain" \ 
  -d 'import pyautogui; pyautogui.moveTo(500, 300); pyautogui.click(); pyautogui.hotkey("ctrl", "c")'
python
import os
import requests

def parse_pyautogui_code(code: str) -> dict:
    url = "https://coasty.ai/v1/parse"
    key = os.getenv("COASTY_API_KEY")
    resp = requests.post(
        url,
        headers={
            "Authorization": f"Bearer {key}",
            "Content-Type": "text/plain",
        },
        data=code,
    )
    resp.raise_for_status()
    return resp.json()

if __name__ == "__main__":
    raw = """import pyautogui
pyautogui.moveTo(500, 300)
pyautogui.click()
pyautogui.hotkey("ctrl", "c")
"""
    actions = parse_pyautogui_code(raw)
    print(actions)

Key fields in the response

  • type: one of click, move, drag, scroll, keypress, hotkey
  • x, y: integer coordinates for mouse moves and clicks
  • button: 'left' or 'right' for click or drag actions
  • key: character for keypress (e.g. 'Enter', 'Tab')
  • modifiers: list of strings like ['ctrl'] for hotkey combinations
  • duration: optional float for how long a hold lasts
  • action_id: unique identifier for each parsed instruction
  • source_line: original line number from the PyAutoGUI script

POST to https://coasty.ai/v1/parse with a text/plain body of your PyAutoGUI code and read the structured action array from the JSON response.

Where this beats brittle automation

Selectors and XPath break when UI changes or when you run on different machines. The parse endpoint decouples your script from screen coordinates. You define intent with PyAutoGUI keywords, and the model maps those to structured actions that the computer use agent executes on any environment. This means you can share a single action array across teams, reuse it in workflows, and let the agent adapt to layout changes without rewriting selectors.

Start converting your PyAutoGUI scripts into reusable structured actions with the free /v1/parse endpoint. Use those actions in stateless predictive steps or store them as workflow steps. Build robust computer use agents without maintaining brittle selectors. Get a key at https://coasty.ai/developers.

Want to see this in action?

View Case Studies
Try Coasty Free