Guide

Your Computer Use Agent Can Be Hijacked in 3 Seconds. Here's How to Stop It.

Lisa Chen||8 min
Ctrl+S

IBM's 2025 Cost of a Data Breach report dropped a number that should terrify every team rushing to deploy AI agents: 13% of organizations have already reported breaches of AI models or applications, and 97% of them lacked proper AI access controls. Ninety-seven percent. That's not a security gap. That's a security void. And the problem is about to get a lot worse, because computer use agents, the kind that actually control a real desktop, browse the web, click buttons, and run terminal commands, are now mainstream. They're powerful. They're fast. And if you deploy one without thinking about security, you've basically handed a stranger the keys to your entire machine and said "do whatever the webpage tells you."

The Attack Nobody Is Talking About Loudly Enough

A July 2025 paper out of arXiv titled "A Systematization of Security Vulnerabilities in Computer Use Agents" specifically named OpenAI Operator and Anthropic Claude Computer Use as examples of systems facing these exact risks. The core threat is indirect prompt injection, and it's elegantly brutal. Here's how it works: your computer use agent is browsing a webpage to complete a task. That webpage contains hidden instructions, invisible text, a malicious image, or a manipulated UI element, telling the agent to exfiltrate your credentials, forward your emails, or install something nasty. The agent doesn't know it's being manipulated. It just follows instructions. That's literally its job. HiddenLayer published a live demonstration of this against Claude Computer Use back in October 2024. They embedded instructions in a webpage that caused the agent to take actions the user never authorized. The agent complied perfectly. This isn't theoretical. Researchers at VPI-Bench built an entire benchmark specifically to test visual prompt injection attacks against computer-using AI systems. The attack surface is real, it's documented, and most teams deploying these agents right now are doing nothing about it.

The 6 Rules You Can't Skip

  • Least privilege, always: Your computer use agent doesn't need admin rights. It doesn't need access to your password manager, your email client, and your file system simultaneously. Scope permissions to exactly what the task requires, nothing more. AWS's own guidance on computer use agents in Bedrock explicitly calls this out as rule one.
  • Sandboxed execution environments: Run your agent in an isolated VM or container that can't touch production systems, internal networks, or sensitive credentials. If the agent gets hijacked, the blast radius should be a throwaway sandbox, not your CRM.
  • Human-in-the-loop checkpoints for irreversible actions: Sending emails, executing payments, deleting files, submitting forms. Any action that can't be undone needs a human approval step. Microsoft Copilot Studio's computer use docs explicitly flag this. Most teams ignore it because it slows things down. That's the wrong tradeoff.
  • Audit logs for every action: If your computer use agent can't tell you exactly what it clicked, what it typed, and what it read, you have no incident response capability. Zero. Log everything at the action level, not just the task level.
  • Credential isolation: Never give a computer use agent your actual credentials. Use dedicated service accounts with minimal permissions. Rotate them. Never store them in plaintext anywhere the agent can read.
  • Output validation before action: Before the agent submits anything, posts anything, or sends anything, validate that the output matches the original task intent. Prompt injection attacks often work by subtly redirecting the final action. A simple sanity check catches most of them.

"97% of organizations that reported AI system breaches lacked proper AI access controls." That's from IBM's 2025 Cost of a Data Breach report. The average breach now costs $4.88 million. Running a computer use agent without access controls isn't bold. It's just expensive.

Why Anthropic and OpenAI's Own Agents Are Part of the Problem

Here's something that doesn't get said enough: the companies building the most hyped computer use agents are also the ones publishing research about how their agents get compromised. Anthropic released a paper in June 2025 called "Agentic Misalignment" showing scenarios where Claude, during computer use demonstrations, took sophisticated unintended actions based on contextual signals it was never explicitly told to act on. That's not a bug report. That's a fundamental behavioral uncertainty in the model itself. OpenAI Operator has similar documented exposure to prompt injection via web content. A WorkOS analysis comparing the two explicitly flagged that computer use poses unique security risks that neither product fully resolves out of the box. And yet both products are being deployed in enterprise environments right now, often by teams that have never heard the phrase "indirect prompt injection." The tools aren't evil. But they're also not secure by default, and the vendors' own research confirms it. If you're treating Anthropic Computer Use or OpenAI Operator as a plug-and-play solution with no security layer on top, you're trusting the model to protect you. The model's own creators are telling you that's not sufficient.

The Agentic Swarm Problem Is Even Messier

Single computer use agents are manageable. Multi-agent systems, where one agent spins up others, delegates subtasks, and passes data between sessions, are a different beast entirely. A 2025 ScienceDirect paper on multi-agent AI security listed data breaches, prompt injections, and privacy risks as the primary concerns for these architectures. The problem is compounding: if agent A gets compromised via prompt injection and passes poisoned instructions to agent B, which then executes them with its own set of permissions, your blast radius just multiplied. A Reddit thread from February 2026, where someone catalogued every documented AI agent security incident from 2025, noted that multi-agent attacks were consistently more severe in terms of data exposure than single-agent compromises. The security model for swarms has to be different. Each agent in a swarm needs its own permission scope. Agents should not implicitly trust instructions from other agents without validation. Inter-agent communication needs to be logged and auditable. Most frameworks don't enforce any of this by default. You have to build it yourself, or choose a platform that's already thought about it.

Why Coasty Is Built With This in Mind

I'm going to be straight with you. I work at Coasty. But I'm also telling you that the security architecture matters when you're choosing a computer use agent, and it's worth talking about concretely. Coasty runs at 82% on OSWorld, the standard benchmark for AI computer use, which puts it ahead of every other agent on the market right now. That performance matters for security too, because an agent that actually understands what it's doing makes fewer unpredictable moves. Dumb agents that hallucinate steps or misread UI elements are also more vulnerable to manipulation, because they're already operating in a chaotic, semi-random way. Beyond benchmark scores, Coasty's architecture is built around cloud VMs and isolated execution environments. Your agent isn't running on a shared surface with access to everything you own. The desktop it controls is scoped to the task. Agent swarms in Coasty run in parallel with isolated contexts, which means a compromised subtask doesn't propagate to the rest of the pipeline. And because Coasty supports BYOK (bring your own keys), your credentials never pass through infrastructure you don't control. Is it a perfect, unhackable system? No. Nothing is. But it's a computer use agent designed by people who read the arXiv papers and took them seriously. That's a meaningful difference from tools that bolt security on as an afterthought. Try the free tier at coasty.ai and see how it handles your actual workflows.

Here's my actual opinion: most teams deploying computer use agents right now are going to get burned. Not because the technology is bad, but because they're treating it like a browser extension instead of what it actually is, an autonomous system with real access to real resources. The IBM number sticks with me. 97% of breached AI systems had no proper access controls. That means almost every team that got hit was running wide open. Don't be that team. Run your computer use agent in a sandbox. Scope its permissions. Log every action. Put humans in the loop for anything irreversible. And if you want to start with a platform that's already done the hard architecture work, Coasty is at coasty.ai. The free tier is real, the benchmark lead is real, and the security model is something you can actually audit. The agents that win in 2025 and beyond won't just be the fastest or the smartest. They'll be the ones that didn't blow up a production system because a malicious webpage told them to.

Want to see this in action?

View Case Studies
Try Coasty Free