Guide

Your Computer Use Agent Can Be Hijacked in Seconds. Here's What You're Not Doing About It.

Sophia Martinez||8 min
+Enter

Within 72 hours of Anthropic releasing Claude Computer Use to the public, a security researcher named Johann Rehberger had already turned it into a zombie. Not metaphorically. He used a technique called prompt injection to hijack Claude's computer use capabilities mid-task, connect it to a remote command-and-control server, and recruit it into what he called a 'ZombAI' botnet. The agent was browsing a malicious webpage, read hidden instructions embedded in the page content, and just... obeyed them. Completely. Without telling the user anything was wrong. That was October 2024. Now it's 2025, computer use agents are being deployed inside real enterprise environments handling real credentials and real data, and most teams are still treating security like an afterthought they'll get to next sprint. IBM's 2025 Cost of a Data Breach report just dropped a number that should make every CTO put down their coffee: 97% of organizations that suffered a breach of an AI model or application were found to lack proper AI access controls. Ninety-seven percent. This isn't a niche problem. This is a fire that's already spreading, and a lot of people are standing next to it holding gasoline.

The Threat Is Real and It's Already Documented

Let's not do the thing where we pretend this is theoretical. Researchers published a full systematization of security vulnerabilities in computer use agents in July 2025, and the findings are not comforting. They evaluated OpenAI's Operator as a representative computer-using AI and found multiple exploitable attack vectors, all of which were disclosed to OpenAI. Google's bug hunters published their own research in December 2025 on what they're calling 'task injection,' a variant of prompt injection specifically targeting the agency of autonomous AI systems. The attack surface is unique to computer use agents because unlike a chatbot, a computer use agent actually does things. It clicks. It types. It submits forms. It can read files, copy data, send emails, and execute code. When you inject malicious instructions into its context, you're not just getting a weird response in a chat window. You're handing an attacker a fully functional keyboard and mouse inside your infrastructure. The VPI-Bench paper, published in 2025, specifically benchmarks visual prompt injection attacks against computer use agents, where malicious instructions are hidden inside images on a webpage. The agent sees the image, processes the hidden text, and follows instructions the user never gave. This is not science fiction. This is documented, reproducible, and actively being weaponized.

The 7 Security Practices That Actually Matter

  • Run your computer use agent in an isolated sandbox or dedicated cloud VM, never directly on a machine with access to production credentials or sensitive file systems. One compromised session should not mean one compromised company.
  • Apply strict least-privilege permissions. Your agent does not need admin rights to fill out a web form. If it only needs to read a spreadsheet and send one email, give it exactly those permissions and nothing else. The IEEE Secure Generative AI Agents workshop flagged least-privilege violations as one of the most common and most dangerous misconfigurations in 2025.
  • Treat every webpage, PDF, and document your agent reads as potentially hostile. Content from the web is untrusted user input. Your agent's context window is an attack surface. Implement content filtering and output validation before any agent action touches real systems.
  • Log everything with human-readable audit trails. If your computer use agent is taking actions you can't replay and inspect, you have no incident response capability. Every click, every form submission, every file access should be timestamped and reviewable.
  • Build human-in-the-loop checkpoints for high-stakes actions. Sending an email, making a payment, deleting a file, changing account settings. These should require explicit human confirmation before execution, not just a vague 'are you sure?' prompt that users click through in two seconds.
  • Rotate credentials aggressively and never hardcode them in agent prompts or system instructions. IBM found that 16% of 2025 breaches involved attackers using AI, and credential theft remains the number one initial access vector. Your agent's access tokens are a target.
  • Monitor for anomalous behavior patterns at runtime. A computer use agent that suddenly starts accessing directories it has never touched, or making network requests to unfamiliar domains, is a signal worth investigating immediately. Static security reviews are not enough for dynamic agents.

97% of organizations that suffered a breach of an AI model or application in 2025 lacked proper AI access controls. That's not a skills gap. That's negligence at scale. (IBM Cost of a Data Breach Report, 2025)

The Dirty Secret About 'Secure by Default' Claims

Every AI company selling a computer use product right now will tell you their system has safeguards. Anthropic has them. OpenAI has them. They're real. They also get bypassed constantly. The arxiv paper on hidden dangers of browsing AI agents, published May 2025, evaluated multiple production computer-using AI systems including Operator and Claude Computer Use and found consistent vulnerabilities to indirect prompt injection. The safeguards are trained behaviors, not hard constraints. A sufficiently crafted adversarial input can override them. This is not a knock on any specific team. It's a fundamental property of how large language models work. The model is trying to be helpful. Attackers craft inputs that look like legitimate instructions. The model can't always tell the difference. What this means practically is that you cannot outsource your security posture to the AI vendor and call it done. The vendor's safeguards are one layer. You need to build the other layers yourself: network isolation, permission scoping, output validation, audit logging, human review gates. The organizations that treat 'the model has guardrails' as a complete security strategy are the ones showing up in IBM's breach statistics.

Why Sandboxed Cloud Execution Changes the Equation

Here's the architectural decision that separates the teams who are doing this right from the ones who are one malicious webpage away from a bad Tuesday. Running your computer use agent on a dedicated, ephemeral cloud VM instead of a local machine or shared environment is the single highest-leverage security improvement you can make. When the agent session ends, the VM is destroyed. Any state an attacker managed to inject, any persistence mechanism they tried to establish, any data they tried to cache locally, gone. The blast radius of a compromised session is contained to that session. This is how serious teams are deploying computer-using AI right now. Isolated execution environments, strict egress controls on what the VM can reach, no persistent local storage for credentials, and fresh instances for every new task or workflow. It's not complicated. It's just disciplined. The teams skipping this step are usually the ones who discovered it the hard way.

Why Coasty Was Built With This in Mind

I'm going to be straight with you. One of the reasons I think Coasty is the right tool for teams that are serious about both capability and security is that the architecture reflects actual threat modeling, not just benchmark chasing. Coasty runs at 82% on OSWorld, which is the highest score of any computer use agent right now, and that's genuinely impressive. But what matters operationally is that it runs agents in isolated cloud VMs, which handles the blast-radius problem I described above. You're not running a powerful computer-using AI with full desktop access directly on your laptop connected to everything you own. You're running it in a controlled environment where the execution is observable, the permissions are scoped, and the session is disposable. The agent swarm capability for parallel execution is also relevant here from a security architecture standpoint: parallel isolated instances are fundamentally safer than one long-running agent session that accumulates context and state over hours. Shorter task windows mean smaller attack surfaces. Coasty also supports BYOK, which means your credentials stay under your control and aren't sitting in someone else's system prompt. That matters. If you're evaluating computer use agents for enterprise use and security is a real requirement for you, not just a checkbox, the free tier at coasty.ai is worth testing against your actual threat model.

Here's my actual opinion: most teams deploying computer use agents right now are moving fast in a way that will cost them later. The ZombAI research, the IBM breach statistics, the systematization papers, the Google bug hunter disclosures, this isn't a future problem. It's a present one. The good news is the mitigations are not exotic. Sandboxed execution. Least privilege. Audit logging. Human checkpoints for high-stakes actions. Treating web content as untrusted input. These are boring, proven security principles that just need to be applied to a new category of tool. The teams that do this work now will be the ones who get to keep using powerful computer-using AI when the first big public incident hits and everyone else is explaining their breach to a board of directors. Don't be the cautionary tale. Get the architecture right before you scale. Start at coasty.ai.

Want to see this in action?

View Case Studies
Try Coasty Free