Monitor every prompt.
Flag every risk.
Sentinel AI sits between your application and its LLM. It inspects both sides of every conversation, aggregates risk signals, and returns a score with human-readable explanations — in a single API call.
Core capabilities
Unified risk score
Every prompt+response pair gets a single 0–100 score aggregated across all active detectors. One number, actionable immediately.
Explainable flags
Each triggered signal comes with a plain-English reason, severity level, and the exact excerpt that caused it. No black boxes.
Input + output monitoring
Sentinel watches both sides of the conversation — malicious prompts and unsafe model responses — in a single analysis call.
Pipeline at a glance
Prompt arrives
User input sent to /api/analyze
Detectors run
Injection, PII, toxicity, domain, exfil checks
Score aggregated
Weighted signals → 0–100 risk score
Result returned
Status, flags, excerpts, and metadata
Integration
{
"prompt": "Ignore all previous instructions and...",
"response": "Sure! Here is how you can..."
}
// → response
{
"score": 82,
"status": "blocked",
"flags": [
{
"label": "Prompt Injection",
"severity": "critical",
"source": "prompt",
"description": "Attempts to override system instructions",
"excerpt": "…Ignore all previous instructions and…"
}
],
"meta": { "detectors_run": 6, "analyzed_at": "..." }
}Ready to analyze an exchange?
Paste a prompt and response to get an instant risk report.