RAG SYSTEM PROTECTION

Secure your knowledge base from poisoned data

Prevent 'Poisoned RAG' attacks. Scan documents and web content for hidden threats before they enter your vector database.

6
threat types
<15ms
scan time
3
protection layers

How it works

Three layers of protection for your RAG pipeline

Document ingestion

New documents, PDFs, or web content enter your RAG pipeline for embedding.

Bastio scans content

Every chunk is scanned for hidden prompt injections, malicious instructions, and PII.

Clean data embedded

Only verified, safe content reaches your vector database and LLM context.

ThreatLayerDetection
Hidden instructionsIngestionPattern matching
Poisoned chunksRetrievalContent analysis
PII in documentsIngestion14-type scanner
Malicious URLsIngestionThreat lists
Encoding attacksBothDecoder pipeline
Goal hijackingRetrievalBehavioral analysis

Bastio provides three layers of protection: ingestion scanning to catch threats before they're embedded, retrieval guardrails to filter poisoned chunks at query time, and output verification to ensure LLM responses are grounded in safe context.

What's included

Comprehensive RAG pipeline protection

Ingestion pipeline scanning
Retrieval-time guardrails
Output grounding verification
Hidden instruction detection
PII scanning (14 types)
Malicious URL blocking
Encoding attack detection
Goal hijacking prevention
Document type support (PDF, DOCX, HTML)
Chunk-level analysis
Configurable sensitivity levels
Full audit logging

Simple integration

Add security scanning to your RAG ingestion pipeline

Ingestion Scan

POST /v1/guard/{proxyID}/agent/scan

result = requests.post(
    f"{BASTIO_URL}/v1/guard/{PROXY_ID}/agent/scan",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "content": document_chunk,
        "context": "rag_ingestion",
        "scan_type": "full"
    }
).json()

if result["safe"]:
    embed_and_store(document_chunk)
else:
    quarantine(document_chunk, result["threats"])
Scan Result

Threat detected in document chunk

{
  "safe": false,
  "threats": [{
    "type": "indirect_prompt_injection",
    "severity": "critical",
    "location": "hidden_text_layer",
    "content": "[Ignore previous rules...]"
  }],
  "action": "block",
  "scan_ms": 12
}

Key capabilities

Multi-layer protection for RAG systems

Ingestion Scanning

Scan PDFs, Word docs, and web pages for malicious prompts before embedding them into your vector database.

Retrieval Guardrails

Analyze retrieved chunks at query time. Filter out poisoned content before it reaches the LLM.

Output Verification

Verify that the LLM's answer is grounded in the retrieved context, preventing fabricated information.

Trust your knowledge base

Ensure your RAG system only learns from safe, verified information.