BiteBrief | Documentation
Testing Mode — Payments Disabled Home

BiteBrief API v3.1

Summarize any URL, YouTube video, podcast, or PDF with a single API call. Powered by Gemini 3.1 Pro/Flash with intelligent model routing and automatic fallback.

Closed Testing Mode: All payment systems are disabled. The API is available for invited testers only. Paid access will open once all systems are validated.

Base URL
https://api.bitebrief.com
Version
v3.0.0

Key Features

  • Gemini-first: Gemini 2.5 Flash (speed) and Pro (quality) with OpenRouter/Claude fallback
  • Multimodal: Native audio/video processing without external transcription
  • Smart routing: Auto-selects best model by content type, length, and user tier
  • Caching: Identical requests return cached results instantly at zero LLM cost
  • Cost tracking: Every response includes model, tokens, cost, and latency
  • Premium features: Audio Overview, Q&A, Mind Map, Translation
  • Agent-ready: x402 protocol for autonomous AI agent payments (USDC on Base)
  • Cost Guard: Real-time margin enforcement, minimum 3x markup on all operations

Authentication

All API requests require a Bearer token in the Authorization header. x402 agent endpoints use payment-based auth instead.

Header
Authorization: Bearer YOUR_API_KEY

API keys are provisioned during testing. Contact hello@bitebrief.com for access.

Quickstart

curl — Summarize a YouTube video
curl -X POST https://api.bitebrief.com/v1/summarize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
    "format": "markdown",
    "length": "medium"
  }'

POST /v1/summarize

Summarize any URL. The system auto-detects content type and routes to the optimal model.

Request Body

FieldTypeRequiredDescription
urlstringYesURL to summarize
formatstringNo"markdown", "text", "json", "bullet". Default: "markdown"
lengthstringNo"short", "medium", "long", "bullet". Default: "medium"
modelstringNoForce a specific model (e.g., "gemini-2.5-pro")
nocacheboolNoSkip cache, force fresh LLM call. Default: false

Response

200 OK
{
  "id": "sum_a1b2c3d4e5f6",
  "url": "https://example.com/article",
  "type": "webpage",
  "summary": "# Key Points\n\n- Point one...\n- Point two...",
  "format": "markdown",
  "word_count": 280,
  "source_word_count": 5200,
  "title": "Article Title",
  "processing_time_ms": 1840,
  "timestamp": "2026-02-25T18:00:00",
  "model": {
    "name": "gemini-2.5-flash",
    "provider": "gemini",
    "input_tokens": 8200,
    "output_tokens": 480,
    "cost_usd": 0.0010,
    "latency_ms": 1620,
    "routing_reason": "webpage content → gemini-2.5-flash"
  }
}

Supported Content Types

TypeDefault ModelPremium ModelFallback
webpageGemini FlashGemini ProOpenRouter/Claude
youtube (<30min)Gemini FlashGemini ProOpenRouter/Claude
youtube (>30min)Gemini ProGemini ProOpenRouter/Claude
podcast / audioGemini Pro (native)Gemini ProOpenRouter/Claude
videoGemini Pro (native)Gemini ProOpenRouter/Claude
pdfGemini FlashGemini ProOpenRouter/Claude

POST /v1/quote

Get a price quote before summarizing. Required when dynamic pricing is enabled. Quotes are valid for 24 hours and single-use.

Request
{ "url": "https://example.com/podcast.mp3" }
Response
{
  "quote_id": "qt_abc123def456",
  "url": "https://example.com/podcast.mp3",
  "credits_required": 60,
  "cost_usd": 1.49,
  "backend_cost": 0.48,
  "margin_percent": 67.8,
  "tier": "long",
  "expires_at": 1740600000
}

POST /v1/batch

Submit up to 100 URLs for batch processing. Each URL is summarized independently.

Request
{
  "urls": [
    "https://example.com/article1",
    "https://example.com/article2"
  ],
  "format": "markdown",
  "length": "short"
}

Audio Overview +5 credits

Generate an executive-style briefing from audio/podcast content. Includes speaker insights, action items, and tone analysis.

POST /v1/features/audio-overview
{ "content": "transcript text...", "url": "https://..." }

Q&A Follow-up +3 credits/question

Ask follow-up questions about any previously summarized content.

POST /v1/features/ask
{
  "question": "What were the main arguments?",
  "summary": "Previously generated summary...",
  "content": "Original content (optional)..."
}

Mind Map +5 credits

Extract a hierarchical topic structure as structured JSON.

POST /v1/features/mindmap
{ "content": "article or transcript text..." }

Translation +2 credits

Translate any summary to 30+ languages. Use GET /v1/features/languages for the full list.

POST /v1/features/translate
{ "summary": "Summary text...", "language": "es" }

x402 Agent-to-Agent Payments

AI agents can use BiteBrief without an API key by paying per request via the x402 protocol. Payments in USDC on Base chain.

Flow

  1. Agent calls POST /api/x402/quote with the URL
  2. Gets a quote with price, wallet address, and payment instructions
  3. Agent sends USDC payment to the wallet
  4. Agent calls POST /api/x402/summarize with the tx hash
  5. Summary is returned after payment verification

Wallet: Payments to sovereign wallet on Base chain. Minimum per-request pricing enforced.

Agent Quoting

POST /api/x402/quote
{ "url": "https://example.com/article" }
Response
{
  "quote_id": "qt_...",
  "cost_usd": 0.15,
  "wallet": "0x43E03211a163A126999393Ab6a6A950FC7fc3dC6",
  "chain": "base",
  "currency": "USDC",
  "expires_at": 1740600000
}

Model Routing

BiteBrief automatically selects the optimal model based on content type, duration, and user tier. You can override with the model parameter.

GET /v1/models — List available models
{
  "models": [
    { "model": "gemini-2.5-flash", "provider": "gemini", "tier": "standard" },
    { "model": "gemini-2.5-pro", "provider": "gemini", "tier": "premium" },
    { "model": "anthropic/claude-sonnet-4", "provider": "openrouter", "tier": "standard" }
  ]
}

Caching

Responses are cached by sha256(url + model + options). Cache hits return instantly at zero LLM cost. TTL: 24 hours.

  • Force fresh: Set "nocache": true in the request body
  • Cache stats: GET /api/cache
  • Clear cache: POST /api/cache/clear (admin)

Rate Limits

Hard rate limits are enforced per API key to prevent abuse and ensure fair usage.

PlanRequests/minRequests/hourDaily max
Free (test)53050
Starter10120300
Pro305001000
Team6020005000
x402 Agent10100500

Exceeding limits returns 429 Too Many Requests with a Retry-After header.

Cost Guard

Every request is checked in real time against the Cost Guard system. If the projected margin falls below the minimum threshold (3x cost), the request is blocked.

Margin Enforcement

  • Minimum markup: 3x backend cost (67% minimum margin)
  • Target margin: 70-80% on standard operations
  • Blocked if: user_price < backend_cost * 3
  • Logged: All margin violations are logged and alerted

Anti-Abuse Policy

Designed to be highly profitable. Abuse = automatic suspension.

  • Spike detection: If usage exceeds 10x the plan average within any 1-hour window, the account is auto-paused and flagged for review.
  • Daily limits: Hard caps per plan (see Rate Limits). No exceptions.
  • No unlimited free: Free tier is capped. No credit card workarounds.
  • Agent monitoring: x402 agents are rate-limited and monitored for patterns that indicate relay abuse or scraping.
  • Overage auto-charge: Paid plans that exceed their credit allocation are auto-charged via Stripe at overage rates.
  • Suspension: Accounts violating fair use are suspended immediately. No refund for abused credits.

Metrics

Real-time LLM performance metrics and cache statistics.

GET /api/metrics
{
  "llm": {
    "total_llm_calls": 142,
    "total_llm_cost_usd": 0.2840,
    "models": {
      "gemini/gemini-2.5-flash": {
        "calls": 120, "success_rate_percent": 99.2,
        "avg_cost_usd": 0.0012, "avg_latency_ms": 1200
      }
    }
  },
  "cache": {
    "entries": 89, "hits": 34, "misses": 108,
    "hit_rate_percent": 23.9,
    "estimated_savings_usd": 0.17
  }
}

Health & Status

GET /api/health Basic health check
GET /api/status Full status with LLM provider availability, cache stats, feature flags
GET /api/v1/features List premium features and available models