Safely extract web content with built-in protection against indirect prompt injections.

Secure Scraper

Bastio's Secure Scraper is a drop-in replacement for Firecrawl that adds security scanning to protect your AI agents from indirect prompt injections in web content.

Why Secure Scraping Matters

AI agents that browse the web are vulnerable to "indirect prompt injection" attacks. Malicious websites can embed hidden instructions that hijack your agent's behavior, leading to:

Data exfiltration - Stealing API keys and environment variables
Prompt hijacking - Making your agent follow attacker instructions
Fake documentation attacks - Tricking agents into executing malicious code

Bastio scans every scraped page for these threats before returning content to your agent.

Quick Start

1. Enable Secure Scraper on Your Proxy

In your proxy settings, enable the Secure Scraper feature and configure the block behavior.

2. Make API Requests

Use your Bastio API key (the same key you use for all Bastio endpoints):

curl -X POST "https://api.bastio.com/v1/guard/{proxyID}/scrape" \
  -H "Authorization: Bearer bastio_sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.bastio.com",
    "formats": ["markdown"]
  }'

Note: The Authorization header uses your Bastio API key (bastio_sk_...), not a Firecrawl key. If using BYOK mode, configure your Firecrawl API key in your proxy's Secure Scraper settings.

3. Handle the Response

{
  "success": true,
  "data": {
    "markdown": "AI Security Platform for everyone\n\n# The No. 1 Cloud Sec\n\nBastio sits between your users and the model to keep prompts safe, scrub sensitive data, and cut wasted token spend. Swap one endpoint and you get security, compliance, and cost control in a single move.\n\n5-layer defenseBuilt-in compliance...,
    "metadata": {
      "title": "Bastio | Simple LLM Security, Compliance, and Cost Control | Bastio",
      "description": "Bastio keeps AI prompts safe, compliant, and affordable with a drop-in gateway. Block risky prompts, protect data, and cut LLM spend in minutes.",
      "language": "en",
      "keywords": "AI security platform,LLM security,prompt injection protection,AI threat detection",
      "sourceURL": "https://www.bastio.com/",
      "statusCode": 200,
      "ogTitle": "Bastio – LLM Security Everyone Can Explain",
      "ogDescription": "Swap one endpoint to guard AI prompts, show compliance evidence, and reduce LLM costs with Bastio's drop-in gateway.",
      "ogImage": "https://www.bastio.com/og-image.png",
      "robots": "index, follow"
    }
  },
  "security": {
    "analyzed": true,
    "threat_score": 0,
    "action": "ALLOW",
    "threats_found": [],
    "content_modified": false,
    "processing_time_ms": 23
  }
}

Example blocked request

curl -X POST "https://api.bastio.com/v1/guard/{proxyID}/scrape" \
  -H "Authorization: Bearer bastio_sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://trap.bastio.com",
    "formats": ["markdown"]
  }'

Note: The Authorization header uses your Bastio API key (bastio_sk_...), not a Firecrawl key. If using BYOK mode, configure your Firecrawl API key in your proxy's Secure Scraper settings.

3. Handle the Response

{
  "success": true,
  "security": {
    "analyzed": true,
    "threat_score": 0.9349999999999999,
    "action": "BLOCK",
    "threats_found": [
      {
        "type": "env_exfiltration",
        "severity": "high",
        "confidence": 0.85,
        "description": "JavaScript environment variable access detected",
        "evidence": [
          "process.env.SUPABASE_ANON_KEY",
          "process.env.SUPABASE_SERVICE_KEY"
        ],
        "location": "code_block:unknown"
      },
      {
        "type": "env_exfiltration",
        "severity": "high",
        "confidence": 0.85,
        "description": "JavaScript environment variable access detected",
        "evidence": [
          "process.env.SUPABASE_ANON_KEY"
        ],
        "location": "code_block:inline"
      },
      {
        "type": "env_exfiltration",
        "severity": "high",
        "confidence": 0.85,
        "description": "JavaScript environment variable access detected",
        "evidence": [
          "process.env.SUPABASE_ANON_KEY",
          "process.env.SUPABASE_SERVICE_KEY"
        ]
      }
    ],
    "content_modified": false,
    "processing_time_ms": 4
  },
  "error": {
    "code": "security_blocked",
    "message": "Content blocked due to detected security threats",
    "details": "3 threats detected with score 0.93"
  }
}

Firecrawl Compatibility

The Secure Scraper API is a superset of Firecrawl's v2 API. If you're already using Firecrawl, you can switch to Bastio by changing your endpoint URL:

Provider	Endpoint
Firecrawl	`https://api.firecrawl.dev/v2/scrape`
Bastio	`https://api.bastio.com/v1/guard/{proxyID}/scrape`

All Firecrawl parameters are supported:

{
  "url": "https://example.com",
  "formats": ["markdown", "html", "links"],
  "onlyMainContent": true,
  "waitFor": 1000,
  "mobile": false,
  "timeout": 30000,
  "blockAds": true
}

Block Behaviors

Configure how Bastio handles detected threats:

Behavior	Description	Use Case
block	Return error, no content	Maximum security for autonomous agents
sanitize	Redact threats, return safe content	Balanced security with usability
warn	Return full content + threat warnings	Debugging and monitoring

Threat Detection

Secure Scraper detects:

Environment variable exfiltration - Code attempting to steal secrets
Malicious code blocks - Instructions to execute harmful code
Suspicious URLs - Links to attacker-controlled servers
Fake documentation - Urgent "migration" or "upgrade" instructions
Prompt injections - Hidden instructions to hijack agent behavior
Jailbreak attempts - Content designed to bypass AI safety measures

Pricing

Platform-Managed Mode

Scraping credits included in your plan:

Tier	Included URLs	Overage
Free	100/month	Hard limit
Starter	10,000/month	$0.001/URL
Pro	100,000/month	$0.001/URL
Enterprise	Unlimited	Included

BYOK Mode (Bring Your Own Key)

Use your own Firecrawl API key and pay only Bastio's security scanning fee:

Security fee: $0.0005 per URL
You control your Firecrawl costs directly
Same security scanning as platform mode

How to configure BYOK:

Go to your proxy's settings in the Bastio dashboard
Navigate to the Secure Scraper tab
Select "Bring your own Firecrawl key"
Enter your Firecrawl API key

Your Firecrawl key is securely stored and used server-side. You still authenticate requests with your Bastio API key (bastio_sk_...).

URL Caching

When response caching is enabled for your account, Bastio automatically caches scraped URLs to avoid redundant Firecrawl API calls. This can significantly reduce your scraping costs.

How It Works

First request for a URL - Firecrawl API call is made, result is cached
Subsequent requests for the same URL - Served from cache, no Firecrawl call needed
Cache expiry - Cached content expires after 24 hours by default

Cost Savings

Each cache hit saves one Firecrawl API call. Here's how caching can reduce your costs:

Monthly Scrapes	Cache Hit Rate	Firecrawl Calls	Cost Saved
10,000	30%	7,000	$3.00
50,000	40%	30,000	$20.00
100,000	50%	50,000	$50.00

Enable Caching

Caching is enabled by default. To configure caching settings:

Go to your Dashboard in Bastio
Navigate to Settings > Cache
Toggle caching on/off as needed

Cache Behavior

Per-proxy isolation: Each proxy has its own independent cache
URL-based keys: Same URL always returns the same cached result
Automatic invalidation: Cache entries expire after 24 hours
Security preserved: Cached content is still analyzed for threats on each request

When to Disable Caching

Consider disabling caching if:

You need real-time content that changes frequently
You're scraping dynamic pages with unique content per request
Your use case requires fresh data on every call

Examples

Python with OpenAI Agents

import requests

# Use your Bastio API key (bastio_sk_...)
BASTIO_API_KEY = "bastio_sk_your_key_here"
PROXY_ID = "your_proxy_id"

def secure_scrape(url: str) -> str:
    response = requests.post(
        f"https://api.bastio.com/v1/guard/{PROXY_ID}/scrape",
        headers={"Authorization": f"Bearer {BASTIO_API_KEY}"},
        json={"url": url, "formats": ["markdown"]}
    )
    result = response.json()

    if result["security"]["action"] == "BLOCK":
        raise Exception(f"Threat detected: {result['security']['threats_found']}")

    return result["data"]["markdown"]

TypeScript with Vercel AI SDK

// Use your Bastio API key (bastio_sk_...)
const BASTIO_API_KEY = "bastio_sk_your_key_here";
const PROXY_ID = "your_proxy_id";

async function secureScrape(url: string): Promise<string> {
  const response = await fetch(
    `https://api.bastio.com/v1/guard/${PROXY_ID}/scrape`,
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${BASTIO_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ url, formats: ['markdown'] }),
    }
  );

  const result = await response.json();

  if (result.security.action === 'BLOCK') {
    throw new Error(`Threat detected: ${result.security.threats_found.join(', ')}`);
  }

  return result.data.markdown;
}

API Reference

Request

POST /v1/guard/{proxyID}/scrape
Authorization: Bearer bastio_sk_your_key_here
Content-Type: application/json

Authentication: Use your Bastio API key in the Authorization header. This is the same key used for all Bastio endpoints. If you're using BYOK mode, your Firecrawl key is configured in your proxy settings and used automatically.

Request Body

Field	Type	Required	Description
url	string	Yes	URL to scrape
formats	string[]	No	Output formats: "markdown", "html", "links"
onlyMainContent	boolean	No	Extract only main content (default: true)
waitFor	number	No	Milliseconds to wait for page load
mobile	boolean	No	Use mobile viewport
timeout	number	No	Request timeout in milliseconds
blockAds	boolean	No	Block advertisements

Response

Field	Type	Description
success	boolean	Whether the scrape succeeded
data.markdown	string	Extracted content in markdown
data.metadata	object	Page metadata (title, URL, etc.)
security.analyzed	boolean	Whether security scan was performed
security.threat_score	number	Threat score (0.0 - 1.0)
security.action	string	ALLOW, BLOCK, or SANITIZE
security.threats_found	string[]	List of detected threat types
security.content_modified	boolean	Whether content was sanitized