BiteBrief API v3.1

Summarize any URL, YouTube video, podcast, or PDF with a single API call. Powered by Gemini 3.1 Pro/Flash with intelligent model routing and automatic fallback.

Closed Testing Mode: All payment systems are disabled. The API is available for invited testers only. Paid access will open once all systems are validated.

Base URL

https://api.bitebrief.com

Version

v3.0.0

Key Features

Gemini-first: Gemini 2.5 Flash (speed) and Pro (quality) with OpenRouter/Claude fallback
Multimodal: Native audio/video processing without external transcription
Smart routing: Auto-selects best model by content type, length, and user tier
Caching: Identical requests return cached results instantly at zero LLM cost
Cost tracking: Every response includes model, tokens, cost, and latency
Premium features: Audio Overview, Q&A, Mind Map, Translation
Agent-ready: x402 protocol for autonomous AI agent payments (USDC on Base)
Cost Guard: Real-time margin enforcement, minimum 3x markup on all operations

Authentication

All API requests require a Bearer token in the Authorization header. x402 agent endpoints use payment-based auth instead.

Header

Authorization: Bearer YOUR_API_KEY

API keys are provisioned during testing. Contact hello@bitebrief.com for access.

Quickstart

curl — Summarize a YouTube video

curl -X POST https://api.bitebrief.com/v1/summarize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
    "format": "markdown",
    "length": "medium"
  }'

POST /v1/summarize

Summarize any URL. The system auto-detects content type and routes to the optimal model.

Request Body

Field	Type	Required	Description
url	string	Yes	URL to summarize
format	string	No	"markdown", "text", "json", "bullet". Default: "markdown"
length	string	No	"short", "medium", "long", "bullet". Default: "medium"
model	string	No	Force a specific model (e.g., "gemini-2.5-pro")
nocache	bool	No	Skip cache, force fresh LLM call. Default: false

Response

200 OK

{
  "id": "sum_a1b2c3d4e5f6",
  "url": "https://example.com/article",
  "type": "webpage",
  "summary": "# Key Points\n\n- Point one...\n- Point two...",
  "format": "markdown",
  "word_count": 280,
  "source_word_count": 5200,
  "title": "Article Title",
  "processing_time_ms": 1840,
  "timestamp": "2026-02-25T18:00:00",
  "model": {
    "name": "gemini-2.5-flash",
    "provider": "gemini",
    "input_tokens": 8200,
    "output_tokens": 480,
    "cost_usd": 0.0010,
    "latency_ms": 1620,
    "routing_reason": "webpage content → gemini-2.5-flash"
  }
}

Supported Content Types

Type	Default Model	Premium Model	Fallback
webpage	Gemini Flash	Gemini Pro	OpenRouter/Claude
youtube (<30min)	Gemini Flash	Gemini Pro	OpenRouter/Claude
youtube (>30min)	Gemini Pro	Gemini Pro	OpenRouter/Claude
podcast / audio	Gemini Pro (native)	Gemini Pro	OpenRouter/Claude
video	Gemini Pro (native)	Gemini Pro	OpenRouter/Claude
pdf	Gemini Flash	Gemini Pro	OpenRouter/Claude

POST /v1/quote

Get a price quote before summarizing. Required when dynamic pricing is enabled. Quotes are valid for 24 hours and single-use.

Request

{ "url": "https://example.com/podcast.mp3" }

Response

{
  "quote_id": "qt_abc123def456",
  "url": "https://example.com/podcast.mp3",
  "credits_required": 60,
  "cost_usd": 1.49,
  "backend_cost": 0.48,
  "margin_percent": 67.8,
  "tier": "long",
  "expires_at": 1740600000
}

POST /v1/batch

Submit up to 100 URLs for batch processing. Each URL is summarized independently.

Request

{
  "urls": [
    "https://example.com/article1",
    "https://example.com/article2"
  ],
  "format": "markdown",
  "length": "short"
}

Audio Overview +5 credits

Generate an executive-style briefing from audio/podcast content. Includes speaker insights, action items, and tone analysis.

POST /v1/features/audio-overview

{ "content": "transcript text...", "url": "https://..." }

Q&A Follow-up +3 credits/question

Ask follow-up questions about any previously summarized content.

POST /v1/features/ask

{
  "question": "What were the main arguments?",
  "summary": "Previously generated summary...",
  "content": "Original content (optional)..."
}

Mind Map +5 credits

Extract a hierarchical topic structure as structured JSON.

POST /v1/features/mindmap

{ "content": "article or transcript text..." }

Translation +2 credits

Translate any summary to 30+ languages. Use GET /v1/features/languages for the full list.

POST /v1/features/translate

{ "summary": "Summary text...", "language": "es" }

x402 Agent-to-Agent Payments

AI agents can use BiteBrief without an API key by paying per request via the x402 protocol. Payments in USDC on Base chain.

Flow

Agent calls POST /api/x402/quote with the URL
Gets a quote with price, wallet address, and payment instructions
Agent sends USDC payment to the wallet
Agent calls POST /api/x402/summarize with the tx hash
Summary is returned after payment verification

Wallet: Payments to sovereign wallet on Base chain. Minimum per-request pricing enforced.

Agent Quoting

POST /api/x402/quote

{ "url": "https://example.com/article" }

Response

{
  "quote_id": "qt_...",
  "cost_usd": 0.15,
  "wallet": "0x43E03211a163A126999393Ab6a6A950FC7fc3dC6",
  "chain": "base",
  "currency": "USDC",
  "expires_at": 1740600000
}

Model Routing

BiteBrief automatically selects the optimal model based on content type, duration, and user tier. You can override with the model parameter.

GET /v1/models — List available models

{
  "models": [
    { "model": "gemini-2.5-flash", "provider": "gemini", "tier": "standard" },
    { "model": "gemini-2.5-pro", "provider": "gemini", "tier": "premium" },
    { "model": "anthropic/claude-sonnet-4", "provider": "openrouter", "tier": "standard" }
  ]
}

Caching

Responses are cached by sha256(url + model + options). Cache hits return instantly at zero LLM cost. TTL: 24 hours.

Force fresh: Set "nocache": true in the request body
Cache stats: GET /api/cache
Clear cache: POST /api/cache/clear (admin)

Rate Limits

Hard rate limits are enforced per API key to prevent abuse and ensure fair usage.

Plan	Requests/min	Requests/hour	Daily max
Free (test)	5	30	50
Starter	10	120	300
Pro	30	500	1000
Team	60	2000	5000
x402 Agent	10	100	500

Exceeding limits returns 429 Too Many Requests with a Retry-After header.

Cost Guard

Every request is checked in real time against the Cost Guard system. If the projected margin falls below the minimum threshold (3x cost), the request is blocked.

Margin Enforcement

Minimum markup: 3x backend cost (67% minimum margin)
Target margin: 70-80% on standard operations
Blocked if: user_price < backend_cost * 3
Logged: All margin violations are logged and alerted

Anti-Abuse Policy

Designed to be highly profitable. Abuse = automatic suspension.

Spike detection: If usage exceeds 10x the plan average within any 1-hour window, the account is auto-paused and flagged for review.
Daily limits: Hard caps per plan (see Rate Limits). No exceptions.
No unlimited free: Free tier is capped. No credit card workarounds.
Agent monitoring: x402 agents are rate-limited and monitored for patterns that indicate relay abuse or scraping.
Overage auto-charge: Paid plans that exceed their credit allocation are auto-charged via Stripe at overage rates.
Suspension: Accounts violating fair use are suspended immediately. No refund for abused credits.

Metrics

Real-time LLM performance metrics and cache statistics.

GET /api/metrics

{
  "llm": {
    "total_llm_calls": 142,
    "total_llm_cost_usd": 0.2840,
    "models": {
      "gemini/gemini-2.5-flash": {
        "calls": 120, "success_rate_percent": 99.2,
        "avg_cost_usd": 0.0012, "avg_latency_ms": 1200
      }
    }
  },
  "cache": {
    "entries": 89, "hits": 34, "misses": 108,
    "hit_rate_percent": 23.9,
    "estimated_savings_usd": 0.17
  }
}

Health & Status

GET /api/health Basic health check

GET /api/status Full status with LLM provider availability, cache stats, feature flags

GET /api/v1/features List premium features and available models