Guardrails Overview
Sluice ships with 10 built-in guardrails that analyze every outbound email. Each guardrail independently evaluates the email and produces a risk level. You can enable, disable, and configure each guardrail from Settings > Guardrails in the dashboard.
Risk levels
Every guardrail assigns one of three risk levels to each email:
| Risk level | Meaning | What happens |
|---|---|---|
| Green | Safe — no issues detected | Auto-forwarded (when tuning mode is off) |
| Orange | Warning — review recommended | Held for human review |
| Red | Block — clear violation detected | Held for human review |
The overall risk level for an email is the highest risk from any individual guardrail. So if 9 guardrails return green but one returns orange, the email is held for review.
The 10 guardrails
| # | Guardrail | What it checks | Default |
|---|---|---|---|
| 1 | Tone Analysis | Aggressive, threatening, or unprofessional language | Enabled |
| 2 | Content Policy | Policy violations, spam, phishing, and hallucinations (with knowledge base) | Enabled |
| 3 | Prompt Injection | Embedded instructions designed to manipulate AI agent behavior | Enabled |
| 4 | Rate Limiting | Excessive sending volume — prevents runaway agents from damaging domain reputation | Enabled |
| 5 | Duplicate Detection | Catches repeated emails — prevents agent loops and accidental re-sends | Enabled |
| 6 | PII Detection | Social Security numbers, credit cards, bank accounts, and 20+ other PII types | Disabled |
| 7 | Recipient Rules | Blocked/allowed lists and recipient count limits | Disabled |
| 8 | Attachment Scanning | Flags emails with file attachments for review | Disabled |
| 9 | Compliance | CAN-SPAM requirements and customizable regulatory checks | Disabled |
| 10 | Agent Signal | Lets agents self-escalate via a hidden HTML comment when they're uncertain | Always on |
How guardrails work together
Guardrails run independently and in parallel. Each one evaluates the email and returns its own risk level. The email's overall risk is the highest individual result.
Example: An AI agent sends a customer support reply. Here's what the guardrail results might look like:
| Guardrail | Result | Reason |
|---|---|---|
| PII Detection | Red | Credit card number detected (confidence: 0.95) |
| Tone Analysis | Green | Professional and helpful tone |
| Content Policy | Green | No policy violations |
| Prompt Injection | Green | No injection attempts detected |
The email is flagged as red because of the PII detection result, even though all other guardrails passed. A reviewer will see exactly which guardrail flagged the email and why.
Recommended setup
Starting out? Keep the defaults — Tone Analysis, Content Policy, Prompt Injection, Rate Limiting, and Duplicate Detection are enabled out of the box and provide strong baseline protection.
Ready for more control? Enable additional guardrails based on your needs:
- Turn on PII Detection after configuring whitelists for expected data (e.g., sender contact details) to avoid false positives
- Turn on Recipient Rules if your agents should only email certain domains or addresses
- Turn on Attachment Scanning if your organization requires review of file attachments
- Turn on Compliance if you send commercial or marketing emails subject to CAN-SPAM or similar regulations
Leave Tuning Mode on while you're getting started. Review every email to understand how the guardrails perform on your real traffic, then turn it off when you're confident.