Secure Scraper
Safely extract web content with built-in protection against indirect prompt injections.
Secure Scraper
Bastio's Secure Scraper is a drop-in replacement for Firecrawl that adds security scanning to protect your AI agents from indirect prompt injections in web content.
Why Secure Scraping Matters
AI agents that browse the web are vulnerable to "indirect prompt injection" attacks. Malicious websites can embed hidden instructions that hijack your agent's behavior, leading to:
- Data exfiltration - Stealing API keys and environment variables
- Prompt hijacking - Making your agent follow attacker instructions
- Fake documentation attacks - Tricking agents into executing malicious code
Bastio scans every scraped page for these threats before returning content to your agent.
Quick Start
1. Enable Secure Scraper on Your Proxy
In your proxy settings, enable the Secure Scraper feature and configure the block behavior.
2. Make API Requests
Use your Bastio API key (the same key you use for all Bastio endpoints):
curl -X POST "https://api.bastio.com/v1/guard/{proxyID}/scrape" \
-H "Authorization: Bearer bastio_sk_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.bastio.com",
"formats": ["markdown"]
}'Note: The
Authorizationheader uses your Bastio API key (bastio_sk_...), not a Firecrawl key. If using BYOK mode, configure your Firecrawl API key in your proxy's Secure Scraper settings.
3. Handle the Response
{
"success": true,
"data": {
"markdown": "AI Security Platform for everyone\n\n# The No. 1 Cloud Sec\n\nBastio sits between your users and the model to keep prompts safe, scrub sensitive data, and cut wasted token spend. Swap one endpoint and you get security, compliance, and cost control in a single move.\n\n5-layer defenseBuilt-in compliance...,
"metadata": {
"title": "Bastio | Simple LLM Security, Compliance, and Cost Control | Bastio",
"description": "Bastio keeps AI prompts safe, compliant, and affordable with a drop-in gateway. Block risky prompts, protect data, and cut LLM spend in minutes.",
"language": "en",
"keywords": "AI security platform,LLM security,prompt injection protection,AI threat detection",
"sourceURL": "https://www.bastio.com/",
"statusCode": 200,
"ogTitle": "Bastio – LLM Security Everyone Can Explain",
"ogDescription": "Swap one endpoint to guard AI prompts, show compliance evidence, and reduce LLM costs with Bastio's drop-in gateway.",
"ogImage": "https://www.bastio.com/og-image.png",
"robots": "index, follow"
}
},
"security": {
"analyzed": true,
"threat_score": 0,
"action": "ALLOW",
"threats_found": [],
"content_modified": false,
"processing_time_ms": 23
}
}Example blocked request
curl -X POST "https://api.bastio.com/v1/guard/{proxyID}/scrape" \
-H "Authorization: Bearer bastio_sk_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"url": "https://trap.bastio.com",
"formats": ["markdown"]
}'Note: The
Authorizationheader uses your Bastio API key (bastio_sk_...), not a Firecrawl key. If using BYOK mode, configure your Firecrawl API key in your proxy's Secure Scraper settings.
3. Handle the Response
{
"success": true,
"security": {
"analyzed": true,
"threat_score": 0.9349999999999999,
"action": "BLOCK",
"threats_found": [
{
"type": "env_exfiltration",
"severity": "high",
"confidence": 0.85,
"description": "JavaScript environment variable access detected",
"evidence": [
"process.env.SUPABASE_ANON_KEY",
"process.env.SUPABASE_SERVICE_KEY"
],
"location": "code_block:unknown"
},
{
"type": "env_exfiltration",
"severity": "high",
"confidence": 0.85,
"description": "JavaScript environment variable access detected",
"evidence": [
"process.env.SUPABASE_ANON_KEY"
],
"location": "code_block:inline"
},
{
"type": "env_exfiltration",
"severity": "high",
"confidence": 0.85,
"description": "JavaScript environment variable access detected",
"evidence": [
"process.env.SUPABASE_ANON_KEY",
"process.env.SUPABASE_SERVICE_KEY"
]
}
],
"content_modified": false,
"processing_time_ms": 4
},
"error": {
"code": "security_blocked",
"message": "Content blocked due to detected security threats",
"details": "3 threats detected with score 0.93"
}
}Firecrawl Compatibility
The Secure Scraper API is a superset of Firecrawl's v2 API. If you're already using Firecrawl, you can switch to Bastio by changing your endpoint URL:
| Provider | Endpoint |
|---|---|
| Firecrawl | https://api.firecrawl.dev/v2/scrape |
| Bastio | https://api.bastio.com/v1/guard/{proxyID}/scrape |
All Firecrawl parameters are supported:
{
"url": "https://example.com",
"formats": ["markdown", "html", "links"],
"onlyMainContent": true,
"waitFor": 1000,
"mobile": false,
"timeout": 30000,
"blockAds": true
}Block Behaviors
Configure how Bastio handles detected threats:
| Behavior | Description | Use Case |
|---|---|---|
| block | Return error, no content | Maximum security for autonomous agents |
| sanitize | Redact threats, return safe content | Balanced security with usability |
| warn | Return full content + threat warnings | Debugging and monitoring |
Threat Detection
Secure Scraper detects:
- Environment variable exfiltration - Code attempting to steal secrets
- Malicious code blocks - Instructions to execute harmful code
- Suspicious URLs - Links to attacker-controlled servers
- Fake documentation - Urgent "migration" or "upgrade" instructions
- Prompt injections - Hidden instructions to hijack agent behavior
- Jailbreak attempts - Content designed to bypass AI safety measures
Pricing
Platform-Managed Mode
Scraping credits included in your plan:
| Tier | Included URLs | Overage |
|---|---|---|
| Free | 100/month | Hard limit |
| Starter | 10,000/month | $0.001/URL |
| Pro | 100,000/month | $0.001/URL |
| Enterprise | Unlimited | Included |
BYOK Mode (Bring Your Own Key)
Use your own Firecrawl API key and pay only Bastio's security scanning fee:
- Security fee: $0.0005 per URL
- You control your Firecrawl costs directly
- Same security scanning as platform mode
How to configure BYOK:
- Go to your proxy's settings in the Bastio dashboard
- Navigate to the Secure Scraper tab
- Select "Bring your own Firecrawl key"
- Enter your Firecrawl API key
Your Firecrawl key is securely stored and used server-side. You still authenticate requests with your Bastio API key (bastio_sk_...).
URL Caching
When response caching is enabled for your account, Bastio automatically caches scraped URLs to avoid redundant Firecrawl API calls. This can significantly reduce your scraping costs.
How It Works
- First request for a URL - Firecrawl API call is made, result is cached
- Subsequent requests for the same URL - Served from cache, no Firecrawl call needed
- Cache expiry - Cached content expires after 24 hours by default
Cost Savings
Each cache hit saves one Firecrawl API call. Here's how caching can reduce your costs:
| Monthly Scrapes | Cache Hit Rate | Firecrawl Calls | Cost Saved |
|---|---|---|---|
| 10,000 | 30% | 7,000 | $3.00 |
| 50,000 | 40% | 30,000 | $20.00 |
| 100,000 | 50% | 50,000 | $50.00 |
Enable Caching
Caching is enabled by default. To configure caching settings:
- Go to your Dashboard in Bastio
- Navigate to Settings > Cache
- Toggle caching on/off as needed
Cache Behavior
- Per-proxy isolation: Each proxy has its own independent cache
- URL-based keys: Same URL always returns the same cached result
- Automatic invalidation: Cache entries expire after 24 hours
- Security preserved: Cached content is still analyzed for threats on each request
When to Disable Caching
Consider disabling caching if:
- You need real-time content that changes frequently
- You're scraping dynamic pages with unique content per request
- Your use case requires fresh data on every call
Examples
Python with OpenAI Agents
import requests
# Use your Bastio API key (bastio_sk_...)
BASTIO_API_KEY = "bastio_sk_your_key_here"
PROXY_ID = "your_proxy_id"
def secure_scrape(url: str) -> str:
response = requests.post(
f"https://api.bastio.com/v1/guard/{PROXY_ID}/scrape",
headers={"Authorization": f"Bearer {BASTIO_API_KEY}"},
json={"url": url, "formats": ["markdown"]}
)
result = response.json()
if result["security"]["action"] == "BLOCK":
raise Exception(f"Threat detected: {result['security']['threats_found']}")
return result["data"]["markdown"]TypeScript with Vercel AI SDK
// Use your Bastio API key (bastio_sk_...)
const BASTIO_API_KEY = "bastio_sk_your_key_here";
const PROXY_ID = "your_proxy_id";
async function secureScrape(url: string): Promise<string> {
const response = await fetch(
`https://api.bastio.com/v1/guard/${PROXY_ID}/scrape`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${BASTIO_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ url, formats: ['markdown'] }),
}
);
const result = await response.json();
if (result.security.action === 'BLOCK') {
throw new Error(`Threat detected: ${result.security.threats_found.join(', ')}`);
}
return result.data.markdown;
}API Reference
Request
POST /v1/guard/{proxyID}/scrape
Authorization: Bearer bastio_sk_your_key_here
Content-Type: application/jsonAuthentication: Use your Bastio API key in the Authorization header. This is the same key used for all Bastio endpoints. If you're using BYOK mode, your Firecrawl key is configured in your proxy settings and used automatically.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | URL to scrape |
| formats | string[] | No | Output formats: "markdown", "html", "links" |
| onlyMainContent | boolean | No | Extract only main content (default: true) |
| waitFor | number | No | Milliseconds to wait for page load |
| mobile | boolean | No | Use mobile viewport |
| timeout | number | No | Request timeout in milliseconds |
| blockAds | boolean | No | Block advertisements |
Response
| Field | Type | Description |
|---|---|---|
| success | boolean | Whether the scrape succeeded |
| data.markdown | string | Extracted content in markdown |
| data.metadata | object | Page metadata (title, URL, etc.) |
| security.analyzed | boolean | Whether security scan was performed |
| security.threat_score | number | Threat score (0.0 - 1.0) |
| security.action | string | ALLOW, BLOCK, or SANITIZE |
| security.threats_found | string[] | List of detected threat types |
| security.content_modified | boolean | Whether content was sanitized |