Tutorial

Convert PyAutoGUI to Structured Actions with the Free Parse Endpoint

Alex Thompson||5 min
Ctrl+P

PyAutoGUI scripts are great for simple point-and-click tasks, but they break fast when layouts change, you are working with a real browser, or you need to handle dynamic content. The Coasty computer use API lets you feed those scripts to the free /v1/parse endpoint where a vision model turns them into a structured, reusable action DSL. From there you can run those actions on real desktops and browsers, or send them to other Coasty endpoints that see and act like a human.

How it works

The /v1/parse endpoint takes a PyAutoGUI-style script, a screenshot of the screen, and an instruction. It returns a structured list of actions (click, type, scroll, etc.) that can be executed on a real machine. The base URL is https://coasty.ai/v1. You pass the API key via the X-API-Key header or Authorization: Bearer <key> header. This endpoint is free, so you only pay for the steps you run through other Coasty endpoints like /v1/predict or /v1/runs.

python
import os
import base64
import requests

COASTY_API_KEY = os.getenv("COASTY_API_KEY")
BASE_URL = "https://coasty.ai/v1"

# Example PyAutoGUI-style script
pyautogui_script = """
import pyautogui
pyautogui.moveTo(100, 200)
pyautogui.click()
pyautogui.typewrite("Hello, Coasty!")
pyautogui.press('enter')
"""

# Encode the script to base64 (Coasty expects base64)
script_b64 = base64.b64encode(pyautogui_script.encode()).decode()

# Encode a sample screenshot to base64
with open("screenshot.png", "rb") as img:
    screenshot_b64 = base64.b64encode(img.read()).decode()

response = requests.post(
    f"{BASE_URL}/parse",
    headers={
        "X-API-Key": COASTY_API_KEY,
        "Content-Type": "application/json"
    },
    json={
        "script": script_b64,
        "screenshot": screenshot_b64,
        "instruction": "Simulate the PyAutoGUI script on the current screen"
    }
)

if response.status_code == 200:
    result = response.json()
    print("Parsed actions:", result.get("actions"))
else:
    print("Error:", response.text)

Key fields and pricing

  • Use POST https://coasty.ai/v1/parse with X-API-Key header.
  • Body: script (base64 string of your PyAutoGUI script), screenshot (base64 string), instruction (string).
  • Response: JSON with an actions array of structured steps (click, type, scroll, etc.).
  • This endpoint is free. You pay $0.05 per agent step only when you run actions via /v1/predict or /v1/runs.

Parse your PyAutoGUI script once, reuse the structured actions everywhere.

Where this beats brittle automation

Traditional automation relies on hardcoded selectors, XPath, or APIs that break when UI changes. The computer use API lets you generate actions that are grounded in what the model actually sees on screen. You can snapshot the screen, send it to /v1/parse, and get a set of steps that work even if layout shifts. Then those actions can be executed on real desktops, browsers, and terminals via /v1/predict or /v1/runs, giving you a human-like agent instead of a fragile script.

Start by converting your existing PyAutoGUI scripts to structured actions with the free /v1/parse endpoint. Once you have those actions, run them on real machines with /v1/predict or orchestrate full workflows with /v1/runs. Want to build a computer use agent that sees and clicks like a human? Get your API key at https://coasty.ai/developers.

Want to see this in action?

View Case Studies
Try Coasty Free