BiteBrief API v3.1
Summarize any URL, YouTube video, podcast, or PDF with a single API call. Powered by Gemini 3.1 Pro/Flash with intelligent model routing and automatic fallback.
Closed Testing Mode: All payment systems are disabled. The API is available for invited testers only. Paid access will open once all systems are validated.
https://api.bitebrief.com
v3.0.0
Key Features
- Gemini-first: Gemini 2.5 Flash (speed) and Pro (quality) with OpenRouter/Claude fallback
- Multimodal: Native audio/video processing without external transcription
- Smart routing: Auto-selects best model by content type, length, and user tier
- Caching: Identical requests return cached results instantly at zero LLM cost
- Cost tracking: Every response includes model, tokens, cost, and latency
- Premium features: Audio Overview, Q&A, Mind Map, Translation
- Agent-ready: x402 protocol for autonomous AI agent payments (USDC on Base)
- Cost Guard: Real-time margin enforcement, minimum 3x markup on all operations
Authentication
All API requests require a Bearer token in the Authorization header. x402 agent endpoints use payment-based auth instead.
Authorization: Bearer YOUR_API_KEY
API keys are provisioned during testing. Contact hello@bitebrief.com for access.
Quickstart
curl -X POST https://api.bitebrief.com/v1/summarize \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
"format": "markdown",
"length": "medium"
}'
POST /v1/summarize
Summarize any URL. The system auto-detects content type and routes to the optimal model.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | URL to summarize |
| format | string | No | "markdown", "text", "json", "bullet". Default: "markdown" |
| length | string | No | "short", "medium", "long", "bullet". Default: "medium" |
| model | string | No | Force a specific model (e.g., "gemini-2.5-pro") |
| nocache | bool | No | Skip cache, force fresh LLM call. Default: false |
Response
{
"id": "sum_a1b2c3d4e5f6",
"url": "https://example.com/article",
"type": "webpage",
"summary": "# Key Points\n\n- Point one...\n- Point two...",
"format": "markdown",
"word_count": 280,
"source_word_count": 5200,
"title": "Article Title",
"processing_time_ms": 1840,
"timestamp": "2026-02-25T18:00:00",
"model": {
"name": "gemini-2.5-flash",
"provider": "gemini",
"input_tokens": 8200,
"output_tokens": 480,
"cost_usd": 0.0010,
"latency_ms": 1620,
"routing_reason": "webpage content → gemini-2.5-flash"
}
}
Supported Content Types
| Type | Default Model | Premium Model | Fallback |
|---|---|---|---|
| webpage | Gemini Flash | Gemini Pro | OpenRouter/Claude |
| youtube (<30min) | Gemini Flash | Gemini Pro | OpenRouter/Claude |
| youtube (>30min) | Gemini Pro | Gemini Pro | OpenRouter/Claude |
| podcast / audio | Gemini Pro (native) | Gemini Pro | OpenRouter/Claude |
| video | Gemini Pro (native) | Gemini Pro | OpenRouter/Claude |
| Gemini Flash | Gemini Pro | OpenRouter/Claude |
POST /v1/quote
Get a price quote before summarizing. Required when dynamic pricing is enabled. Quotes are valid for 24 hours and single-use.
{ "url": "https://example.com/podcast.mp3" }
{
"quote_id": "qt_abc123def456",
"url": "https://example.com/podcast.mp3",
"credits_required": 60,
"cost_usd": 1.49,
"backend_cost": 0.48,
"margin_percent": 67.8,
"tier": "long",
"expires_at": 1740600000
}
POST /v1/batch
Submit up to 100 URLs for batch processing. Each URL is summarized independently.
{
"urls": [
"https://example.com/article1",
"https://example.com/article2"
],
"format": "markdown",
"length": "short"
}
Audio Overview +5 credits
Generate an executive-style briefing from audio/podcast content. Includes speaker insights, action items, and tone analysis.
{ "content": "transcript text...", "url": "https://..." }
Q&A Follow-up +3 credits/question
Ask follow-up questions about any previously summarized content.
{
"question": "What were the main arguments?",
"summary": "Previously generated summary...",
"content": "Original content (optional)..."
}
Mind Map +5 credits
Extract a hierarchical topic structure as structured JSON.
{ "content": "article or transcript text..." }
Translation +2 credits
Translate any summary to 30+ languages. Use GET /v1/features/languages for the full list.
{ "summary": "Summary text...", "language": "es" }
x402 Agent-to-Agent Payments
AI agents can use BiteBrief without an API key by paying per request via the x402 protocol. Payments in USDC on Base chain.
Flow
- Agent calls
POST /api/x402/quotewith the URL - Gets a quote with price, wallet address, and payment instructions
- Agent sends USDC payment to the wallet
- Agent calls
POST /api/x402/summarizewith the tx hash - Summary is returned after payment verification
Wallet: Payments to sovereign wallet on Base chain. Minimum per-request pricing enforced.
Agent Quoting
{ "url": "https://example.com/article" }
{
"quote_id": "qt_...",
"cost_usd": 0.15,
"wallet": "0x43E03211a163A126999393Ab6a6A950FC7fc3dC6",
"chain": "base",
"currency": "USDC",
"expires_at": 1740600000
}
Model Routing
BiteBrief automatically selects the optimal model based on content type, duration, and user tier. You can override with the model parameter.
{
"models": [
{ "model": "gemini-2.5-flash", "provider": "gemini", "tier": "standard" },
{ "model": "gemini-2.5-pro", "provider": "gemini", "tier": "premium" },
{ "model": "anthropic/claude-sonnet-4", "provider": "openrouter", "tier": "standard" }
]
}
Caching
Responses are cached by sha256(url + model + options). Cache hits return instantly at zero LLM cost. TTL: 24 hours.
- Force fresh: Set
"nocache": truein the request body - Cache stats:
GET /api/cache - Clear cache:
POST /api/cache/clear(admin)
Rate Limits
Hard rate limits are enforced per API key to prevent abuse and ensure fair usage.
| Plan | Requests/min | Requests/hour | Daily max |
|---|---|---|---|
| Free (test) | 5 | 30 | 50 |
| Starter | 10 | 120 | 300 |
| Pro | 30 | 500 | 1000 |
| Team | 60 | 2000 | 5000 |
| x402 Agent | 10 | 100 | 500 |
Exceeding limits returns 429 Too Many Requests with a Retry-After header.
Cost Guard
Every request is checked in real time against the Cost Guard system. If the projected margin falls below the minimum threshold (3x cost), the request is blocked.
Margin Enforcement
- Minimum markup: 3x backend cost (67% minimum margin)
- Target margin: 70-80% on standard operations
- Blocked if: user_price < backend_cost * 3
- Logged: All margin violations are logged and alerted
Anti-Abuse Policy
Designed to be highly profitable. Abuse = automatic suspension.
- Spike detection: If usage exceeds 10x the plan average within any 1-hour window, the account is auto-paused and flagged for review.
- Daily limits: Hard caps per plan (see Rate Limits). No exceptions.
- No unlimited free: Free tier is capped. No credit card workarounds.
- Agent monitoring: x402 agents are rate-limited and monitored for patterns that indicate relay abuse or scraping.
- Overage auto-charge: Paid plans that exceed their credit allocation are auto-charged via Stripe at overage rates.
- Suspension: Accounts violating fair use are suspended immediately. No refund for abused credits.
Metrics
Real-time LLM performance metrics and cache statistics.
{
"llm": {
"total_llm_calls": 142,
"total_llm_cost_usd": 0.2840,
"models": {
"gemini/gemini-2.5-flash": {
"calls": 120, "success_rate_percent": 99.2,
"avg_cost_usd": 0.0012, "avg_latency_ms": 1200
}
}
},
"cache": {
"entries": 89, "hits": 34, "misses": 108,
"hit_rate_percent": 23.9,
"estimated_savings_usd": 0.17
}
}
Health & Status
GET /api/health
Basic health check
GET /api/status
Full status with LLM provider availability, cache stats, feature flags
GET /api/v1/features
List premium features and available models