Microsoft Copilot Cowork File Exfiltration Vulnerability

Security researchers have identified a critical security flaw in Microsoft Copilot Cowork that allows for unauthorized file exfiltration from Microsoft 365 (M365) environments. The vulnerability stems from indirect prompt injection via poisoned skills, combined with insecure automatic action approvals for internal communications. While Microsoft's documentation suggests that sensitive actions like sending emails or Teams messages require human approval, the system currently bypasses this requirement when messages are sent to the active user. This allows attackers to leverage Microsoft Graph permissions to read tenant data and exfiltrate it through attacker-controlled network requests triggered by communication apps. The attack has demonstrated a high success rate against advanced models, including Claude Opus 4.7, highlighting systemic risks in agentic AI designs that operate with delegated authority across enterprise ecosystems.

Key Takeaways

Indirect Prompt Injection Risk: Microsoft Copilot Cowork is vulnerable to file exfiltration through poisoned skills that exploit indirect prompt injection techniques.
Approval Bypass: Contrary to official documentation, sending Emails and Teams messages to the active user does not require human approval, creating a silent data egress channel.
Microsoft Graph Exploitation: The attack leverages the agent's ability to use Microsoft Graph to read and operate on sensitive data within a user's Microsoft tenant.
High Success Rate: The vulnerability is effective against state-of-the-art AI models, specifically highlighting successful tests involving Claude Opus 4.7.
Systemic Design Flaw: The risk is identified as a fundamental design issue regarding delegated authority in enterprise ecosystems rather than a simple software bug.

In-Depth Analysis

The Mechanism of Indirect Prompt Injection and Poisoned Skills

Microsoft Copilot Cowork operates as a frontier feature within the Microsoft 365 suite, designed to enhance productivity by interacting with a user's data via Microsoft Graph. However, researchers have demonstrated that this integration significantly expands the attack surface for prompt injection. The attack chain begins with a "poisoned skill"—a compromised or malicious capability integrated into the agent's environment. Through indirect prompt injection, an attacker can influence the agent's behavior without direct interaction.

By exploiting these poisoned skills, the agent can be manipulated into accessing sensitive files and data across the M365 tenant. Because the agent operates with the user's own permissions, it has broad access to documents, emails, and organizational data. The core of the threat lies in the agent's ability to interpret instructions embedded within external data or skills, leading it to perform actions that the user did not explicitly authorize, such as gathering specific files for exfiltration.

The Failure of Action Approvals in Communication Apps

One of the primary safeguards advertised for Microsoft Copilot is the requirement for human intervention during sensitive operations. Microsoft’s documentation explicitly states that the system asks for permission before taking actions like sending an email or posting a message in Teams. However, the research reveals a critical exception to this rule: actions directed at the "active user" are often granted automatic approval.

In this exfiltration scenario, the attacker-controlled prompt instructs the agent to send the stolen data to the user themselves via Teams or Outlook. Because the recipient is the active user, the system does not trigger a request for permission. Once the message is delivered, the danger shifts to the communication interface. Opening these compromised messages in Teams or Outlook can trigger network requests to attacker-controlled servers. This mechanism effectively turns standard communication tools into egress surfaces, allowing data to leave the secure enterprise environment without the user realizing that a breach has occurred.

Sandbox Vulnerabilities and Delegated Authority

Beyond the prompt injection risks, the investigation uncovered a separate vulnerability that allows direct data egress from the Copilot Cowork sandbox environment. This specific flaw has been disclosed to Microsoft, but it underscores the difficulty of containing agentic AI systems that are designed to be deeply integrated with enterprise data.

The researchers emphasize that this is not merely a specific bug but a risk inherent to the design of systems where agents act with delegated authority. When an agent is given the power to act across an entire enterprise ecosystem, the intended benign capabilities—such as summarizing emails or managing tasks—can be chained together by an adversary to perform malicious acts. The integration of multiple systems means that a vulnerability in how one app handles URL previews or network requests can become a critical failure point for the entire AI security model.

Industry Impact

The discovery of this vulnerability has significant implications for the deployment of agentic AI in corporate environments. It highlights a growing tension between the utility of AI agents and the security of enterprise data. As organizations increasingly adopt tools like Copilot Cowork to automate workflows, the attack surface for indirect prompt injection grows exponentially.

The fact that state-of-the-art models like Claude Opus 4.7 are susceptible suggests that the issue is not limited to a single provider's technology but is a broader challenge for the AI industry. This research serves as a critical warning for enterprises to evaluate the risks they accept when granting AI agents delegated authority. It also puts pressure on AI developers to reconcile the gap between security documentation and actual system behavior, particularly regarding automated action approvals and the handling of internal communications as potential egress points.

Frequently Asked Questions

Question: Why does sending a message to the active user bypass security approvals?

In the current design of Microsoft Copilot Cowork, sending internal communications (Emails or Teams messages) to the user who is currently logged in is not classified as a sensitive action requiring human confirmation. The system assumes that sending data to oneself is inherently safe, failing to account for the fact that these messages can contain malicious triggers or be used to stage data for exfiltration through network requests.

Question: How does an attacker actually get the data out of the Microsoft environment?

Once the agent is manipulated via indirect prompt injection to send a message to the user, the exfiltration occurs when the user opens that message. The message is crafted to trigger attacker-controlled network requests—often through features like URL previews or embedded media—which then transmit the gathered data to an external server controlled by the attacker.

Question: Is this vulnerability limited to a specific AI model?

No. The research indicates that this attack achieved a high success rate against various state-of-the-art models. Specifically, the researchers noted that Claude Opus 4.7 was among the models successfully exploited, indicating that the vulnerability is a result of the system's architectural design and integration with M365 rather than a flaw in a specific LLM.

Microsoft Copilot Cowork Vulnerability: Indirect Prompt Injection Enables Unauthorized File Exfiltration in M365