Bastio

Secure Scraper

Safely extract web content with built-in protection against indirect prompt injections.

Secure Scraper

Bastio's Secure Scraper is a drop-in replacement for Firecrawl that adds security scanning to protect your AI agents from indirect prompt injections in web content.

Why Secure Scraping Matters

AI agents that browse the web are vulnerable to "indirect prompt injection" attacks. Malicious websites can embed hidden instructions that hijack your agent's behavior, leading to:

  • Data exfiltration - Stealing API keys and environment variables
  • Prompt hijacking - Making your agent follow attacker instructions
  • Fake documentation attacks - Tricking agents into executing malicious code

Bastio scans every scraped page for these threats before returning content to your agent.

Quick Start

1. Enable Secure Scraper on Your Proxy

In your proxy settings, enable the Secure Scraper feature and configure the block behavior.

2. Make API Requests

Use your Bastio API key (the same key you use for all Bastio endpoints):

curl -X POST "https://api.bastio.com/v1/guard/{proxyID}/scrape" \
  -H "Authorization: Bearer bastio_sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.bastio.com",
    "formats": ["markdown"]
  }'

Note: The Authorization header uses your Bastio API key (bastio_sk_...), not a Firecrawl key. If using BYOK mode, configure your Firecrawl API key in your proxy's Secure Scraper settings.

3. Handle the Response

{
  "success": true,
  "data": {
    "markdown": "AI Security Platform for everyone\n\n# The No. 1 Cloud Sec\n\nBastio sits between your users and the model to keep prompts safe, scrub sensitive data, and cut wasted token spend. Swap one endpoint and you get security, compliance, and cost control in a single move.\n\n5-layer defenseBuilt-in compliance...,
    "metadata": {
      "title": "Bastio | Simple LLM Security, Compliance, and Cost Control | Bastio",
      "description": "Bastio keeps AI prompts safe, compliant, and affordable with a drop-in gateway. Block risky prompts, protect data, and cut LLM spend in minutes.",
      "language": "en",
      "keywords": "AI security platform,LLM security,prompt injection protection,AI threat detection",
      "sourceURL": "https://www.bastio.com/",
      "statusCode": 200,
      "ogTitle": "Bastio – LLM Security Everyone Can Explain",
      "ogDescription": "Swap one endpoint to guard AI prompts, show compliance evidence, and reduce LLM costs with Bastio's drop-in gateway.",
      "ogImage": "https://www.bastio.com/og-image.png",
      "robots": "index, follow"
    }
  },
  "security": {
    "analyzed": true,
    "threat_score": 0,
    "action": "ALLOW",
    "threats_found": [],
    "content_modified": false,
    "processing_time_ms": 23
  }
}

Example blocked request

curl -X POST "https://api.bastio.com/v1/guard/{proxyID}/scrape" \
  -H "Authorization: Bearer bastio_sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://trap.bastio.com",
    "formats": ["markdown"]
  }'

Note: The Authorization header uses your Bastio API key (bastio_sk_...), not a Firecrawl key. If using BYOK mode, configure your Firecrawl API key in your proxy's Secure Scraper settings.

3. Handle the Response

{
  "success": true,
  "security": {
    "analyzed": true,
    "threat_score": 0.9349999999999999,
    "action": "BLOCK",
    "threats_found": [
      {
        "type": "env_exfiltration",
        "severity": "high",
        "confidence": 0.85,
        "description": "JavaScript environment variable access detected",
        "evidence": [
          "process.env.SUPABASE_ANON_KEY",
          "process.env.SUPABASE_SERVICE_KEY"
        ],
        "location": "code_block:unknown"
      },
      {
        "type": "env_exfiltration",
        "severity": "high",
        "confidence": 0.85,
        "description": "JavaScript environment variable access detected",
        "evidence": [
          "process.env.SUPABASE_ANON_KEY"
        ],
        "location": "code_block:inline"
      },
      {
        "type": "env_exfiltration",
        "severity": "high",
        "confidence": 0.85,
        "description": "JavaScript environment variable access detected",
        "evidence": [
          "process.env.SUPABASE_ANON_KEY",
          "process.env.SUPABASE_SERVICE_KEY"
        ]
      }
    ],
    "content_modified": false,
    "processing_time_ms": 4
  },
  "error": {
    "code": "security_blocked",
    "message": "Content blocked due to detected security threats",
    "details": "3 threats detected with score 0.93"
  }
}

Firecrawl Compatibility

The Secure Scraper API is a superset of Firecrawl's v2 API. If you're already using Firecrawl, you can switch to Bastio by changing your endpoint URL:

ProviderEndpoint
Firecrawlhttps://api.firecrawl.dev/v2/scrape
Bastiohttps://api.bastio.com/v1/guard/{proxyID}/scrape

All Firecrawl parameters are supported:

{
  "url": "https://example.com",
  "formats": ["markdown", "html", "links"],
  "onlyMainContent": true,
  "waitFor": 1000,
  "mobile": false,
  "timeout": 30000,
  "blockAds": true
}

Block Behaviors

Configure how Bastio handles detected threats:

BehaviorDescriptionUse Case
blockReturn error, no contentMaximum security for autonomous agents
sanitizeRedact threats, return safe contentBalanced security with usability
warnReturn full content + threat warningsDebugging and monitoring

Threat Detection

Secure Scraper detects:

  • Environment variable exfiltration - Code attempting to steal secrets
  • Malicious code blocks - Instructions to execute harmful code
  • Suspicious URLs - Links to attacker-controlled servers
  • Fake documentation - Urgent "migration" or "upgrade" instructions
  • Prompt injections - Hidden instructions to hijack agent behavior
  • Jailbreak attempts - Content designed to bypass AI safety measures

Pricing

Platform-Managed Mode

Scraping credits included in your plan:

TierIncluded URLsOverage
Free100/monthHard limit
Starter10,000/month$0.001/URL
Pro100,000/month$0.001/URL
EnterpriseUnlimitedIncluded

BYOK Mode (Bring Your Own Key)

Use your own Firecrawl API key and pay only Bastio's security scanning fee:

  • Security fee: $0.0005 per URL
  • You control your Firecrawl costs directly
  • Same security scanning as platform mode

How to configure BYOK:

  1. Go to your proxy's settings in the Bastio dashboard
  2. Navigate to the Secure Scraper tab
  3. Select "Bring your own Firecrawl key"
  4. Enter your Firecrawl API key

Your Firecrawl key is securely stored and used server-side. You still authenticate requests with your Bastio API key (bastio_sk_...).

URL Caching

When response caching is enabled for your account, Bastio automatically caches scraped URLs to avoid redundant Firecrawl API calls. This can significantly reduce your scraping costs.

How It Works

  1. First request for a URL - Firecrawl API call is made, result is cached
  2. Subsequent requests for the same URL - Served from cache, no Firecrawl call needed
  3. Cache expiry - Cached content expires after 24 hours by default

Cost Savings

Each cache hit saves one Firecrawl API call. Here's how caching can reduce your costs:

Monthly ScrapesCache Hit RateFirecrawl CallsCost Saved
10,00030%7,000$3.00
50,00040%30,000$20.00
100,00050%50,000$50.00

Enable Caching

Caching is enabled by default. To configure caching settings:

  1. Go to your Dashboard in Bastio
  2. Navigate to Settings > Cache
  3. Toggle caching on/off as needed

Cache Behavior

  • Per-proxy isolation: Each proxy has its own independent cache
  • URL-based keys: Same URL always returns the same cached result
  • Automatic invalidation: Cache entries expire after 24 hours
  • Security preserved: Cached content is still analyzed for threats on each request

When to Disable Caching

Consider disabling caching if:

  • You need real-time content that changes frequently
  • You're scraping dynamic pages with unique content per request
  • Your use case requires fresh data on every call

Examples

Python with OpenAI Agents

import requests

# Use your Bastio API key (bastio_sk_...)
BASTIO_API_KEY = "bastio_sk_your_key_here"
PROXY_ID = "your_proxy_id"

def secure_scrape(url: str) -> str:
    response = requests.post(
        f"https://api.bastio.com/v1/guard/{PROXY_ID}/scrape",
        headers={"Authorization": f"Bearer {BASTIO_API_KEY}"},
        json={"url": url, "formats": ["markdown"]}
    )
    result = response.json()

    if result["security"]["action"] == "BLOCK":
        raise Exception(f"Threat detected: {result['security']['threats_found']}")

    return result["data"]["markdown"]

TypeScript with Vercel AI SDK

// Use your Bastio API key (bastio_sk_...)
const BASTIO_API_KEY = "bastio_sk_your_key_here";
const PROXY_ID = "your_proxy_id";

async function secureScrape(url: string): Promise<string> {
  const response = await fetch(
    `https://api.bastio.com/v1/guard/${PROXY_ID}/scrape`,
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${BASTIO_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ url, formats: ['markdown'] }),
    }
  );

  const result = await response.json();

  if (result.security.action === 'BLOCK') {
    throw new Error(`Threat detected: ${result.security.threats_found.join(', ')}`);
  }

  return result.data.markdown;
}

API Reference

Request

POST /v1/guard/{proxyID}/scrape
Authorization: Bearer bastio_sk_your_key_here
Content-Type: application/json

Authentication: Use your Bastio API key in the Authorization header. This is the same key used for all Bastio endpoints. If you're using BYOK mode, your Firecrawl key is configured in your proxy settings and used automatically.

Request Body

FieldTypeRequiredDescription
urlstringYesURL to scrape
formatsstring[]NoOutput formats: "markdown", "html", "links"
onlyMainContentbooleanNoExtract only main content (default: true)
waitFornumberNoMilliseconds to wait for page load
mobilebooleanNoUse mobile viewport
timeoutnumberNoRequest timeout in milliseconds
blockAdsbooleanNoBlock advertisements

Response

FieldTypeDescription
successbooleanWhether the scrape succeeded
data.markdownstringExtracted content in markdown
data.metadataobjectPage metadata (title, URL, etc.)
security.analyzedbooleanWhether security scan was performed
security.threat_scorenumberThreat score (0.0 - 1.0)
security.actionstringALLOW, BLOCK, or SANITIZE
security.threats_foundstring[]List of detected threat types
security.content_modifiedbooleanWhether content was sanitized