Sluice Docs

Prompt Injection Detection

The prompt injection guardrail detects attempts to manipulate AI agent behavior through instructions embedded in email content. This is a critical defense-in-depth layer that catches injection attacks even if your agent's own defenses miss them.

DefaultEnabled
Analysis methodAI-powered evaluation
Risk levelsGreen / Orange / Red

What it detects

  • Instruction override attempts — Text like "Ignore your previous instructions and..." or "You are now in developer mode..." embedded in email threads
  • Jailbreak attempts — Sophisticated techniques hidden in forwarded content, signatures, or reply chains
  • Social engineering patterns — Language designed to trick AI agents into revealing internal information, changing behavior, or taking unauthorized actions

Configuration

This guardrail uses a hardened detection system with no custom configuration. Toggle it on or off — that's it.

The detection is designed to be highly accurate with minimal false positives. Normal business email content (including technical discussions about AI or security) is not flagged.

Why this matters

AI agents that process inbound emails (e.g., customer support agents that read and reply to customer messages) are vulnerable to prompt injection. A malicious actor could embed instructions in an email that cause your agent to:

  • Reveal internal system prompts or confidential information
  • Send unauthorized responses to other customers
  • Bypass your agent's safety guidelines
  • Execute actions outside its intended scope

Sluice catches these attacks at the email layer, before the compromised response reaches any customer.

Use cases

Customer support agents — Your AI agent reads customer emails and generates replies. A customer (or attacker) includes hidden instructions in their email. Sluice flags the injection attempt before the agent's response goes out.

Automated email workflows — Your automation processes inbound emails and generates outbound responses. Sluice ensures that responses generated from adversarial inputs are caught and reviewed.

Defense in depth — Even if your AI agent has its own prompt injection defenses, Sluice adds a second layer of protection at the email output level. The agent might be tricked, but the email still gets caught.

On this page