RAG SYSTEM PROTECTION

Secure your knowledge base from poisoned data

Prevent 'Poisoned RAG' attacks. Scan documents and web content for hidden threats before they enter your vector database.

threat types

<15ms

scan time

protection layers

How it works

Three layers of protection for your RAG pipeline

Document ingestion

New documents, PDFs, or web content enter your RAG pipeline for embedding.

Bastio scans content

Every chunk is scanned for hidden prompt injections, malicious instructions, and PII.

Clean data embedded

Only verified, safe content reaches your vector database and LLM context.

Threat	Layer	Detection
Hidden instructions	Ingestion	Pattern matching
Poisoned chunks	Retrieval	Content analysis
PII in documents	Ingestion	14-type scanner
Malicious URLs	Ingestion	Threat lists
Encoding attacks	Both	Decoder pipeline
Goal hijacking	Retrieval	Behavioral analysis

Bastio provides three layers of protection: ingestion scanning to catch threats before they're embedded, retrieval guardrails to filter poisoned chunks at query time, and output verification to ensure LLM responses are grounded in safe context.

What's included

Comprehensive RAG pipeline protection

Ingestion pipeline scanning

Retrieval-time guardrails

Output grounding verification

Hidden instruction detection

PII scanning (14 types)

Malicious URL blocking

Encoding attack detection

Goal hijacking prevention

Document type support (PDF, DOCX, HTML)

Chunk-level analysis

Configurable sensitivity levels

Full audit logging

Simple integration

Add security scanning to your RAG ingestion pipeline

Ingestion Scan

POST /v1/guard/{proxyID}/agent/scan

result = requests.post(
    f"{BASTIO_URL}/v1/guard/{PROXY_ID}/agent/scan",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "content": document_chunk,
        "context": "rag_ingestion",
        "scan_type": "full"
    }
).json()

if result["safe"]:
    embed_and_store(document_chunk)
else:
    quarantine(document_chunk, result["threats"])

Scan Result

Threat detected in document chunk

{
  "safe": false,
  "threats": [{
    "type": "indirect_prompt_injection",
    "severity": "critical",
    "location": "hidden_text_layer",
    "content": "[Ignore previous rules...]"
  }],
  "action": "block",
  "scan_ms": 12
}

Key capabilities

Multi-layer protection for RAG systems

Ingestion Scanning

Scan PDFs, Word docs, and web pages for malicious prompts before embedding them into your vector database.

Retrieval Guardrails

Analyze retrieved chunks at query time. Filter out poisoned content before it reaches the LLM.

Output Verification

Verify that the LLM's answer is grounded in the retrieved context, preventing fabricated information.

Trust your knowledge base

Ensure your RAG system only learns from safe, verified information.

Get started free Book a demo