INDEPENDENTLY REPRODUCIBLE
Performance
Benchmark Report
Real measurements on public GitHub repositories. Conservative estimates. Fully reproducible by anyone.
Generated: May 3, 2026 · Method: Python ast + tiktoken cl100k_base + BFS subgraph
Small
8.6x
conservative compression
HTTP Client Library
Medium
35.2x
conservative compression
Web Framework
Large
20.2x
conservative compression
Enterprise Web Framework
How SurfClaw Works
SurfClaw converts your selected codebase or document corpus into a knowledge graph, then extracts only the nodes relevant to each query using BFS (breadth-first search, 2 hops), sending only those snippets to the LLM.
❌ Without SurfClaw
Entire raw source files → LLM
(hundreds of thousands of tokens)
(hundreds of thousands of tokens)
✅ With SurfClaw
Relevant nodes only → LLM
(thousands of tokens)
(thousands of tokens)
Conservative estimation: nodes extracted per query × 500 tokens (realistic cost including actual code snippets)
Results Overview
| Codebase | Files | Full Tokens (baseline) | Tokens/Query (SurfClaw) | Conservative Ratio |
|---|---|---|---|---|
| SmallHTTP Client Library | 90 | 182,774 | 21,200 | 8.6x |
| MediumWeb Framework | 360 | 475,208 | 13,500 | 35.2x |
| LargeEnterprise Web Framework | 360 | 442,463 | 21,900 | 20.2x |
Small
encode/httpx
HTTP Client Library
8.6x
conservative
1,258
Graph Nodes
42
Avg Nodes/Query
81.7x
Graph JSON Ratio
Cost savings per query by model
| Model | $/1M Tokens | Savings/Query | SurfClaw Fee | Customer Net |
|---|---|---|---|---|
| GPT-4.1 (OpenAI) | $2.00 | $0.3231 | $0.0323 | $0.2908 |
| GPT-5 (OpenAI) | $1.25 | $0.2020 | $0.0202 | $0.1818 |
| Claude Sonnet 4.6 (Anthropic) | $3.00 | $0.4847 | $0.0485 | $0.4362 |
| Gemini 2.5 Pro (Google) | $1.25 | $0.2020 | $0.0202 | $0.1818 |
| Gemini 2.5 Flash (Google) | $0.30 | $0.0485 | $0.0048 | $0.0436 |
📊 Monthly example (10,000 queries/day × 30 days): $130,860/month net savings · Claude Sonnet 4.6 · conservative estimate
Medium
tiangolo/fastapi
Web Framework
35.2x
conservative
2,592
Graph Nodes
27
Avg Nodes/Query
393.6x
Graph JSON Ratio
Cost savings per query by model
| Model | $/1M Tokens | Savings/Query | SurfClaw Fee | Customer Net |
|---|---|---|---|---|
| GPT-4.1 (OpenAI) | $2.00 | $0.9234 | $0.0923 | $0.8311 |
| GPT-5 (OpenAI) | $1.25 | $0.5771 | $0.0577 | $0.5194 |
| Claude Sonnet 4.6 (Anthropic) | $3.00 | $1.3851 | $0.1385 | $1.2466 |
| Gemini 2.5 Pro (Google) | $1.25 | $0.5771 | $0.0577 | $0.5194 |
| Gemini 2.5 Flash (Google) | $0.30 | $0.1385 | $0.0139 | $0.1247 |
📊 Monthly example (10,000 queries/day × 30 days): $373,980/month net savings · Claude Sonnet 4.6 · conservative estimate
Large
django/django
Enterprise Web Framework
20.2x
conservative
3,078
Graph Nodes
44
Avg Nodes/Query
227.0x
Graph JSON Ratio
Cost savings per query by model
| Model | $/1M Tokens | Savings/Query | SurfClaw Fee | Customer Net |
|---|---|---|---|---|
| GPT-4.1 (OpenAI) | $2.00 | $0.8411 | $0.0841 | $0.7570 |
| GPT-5 (OpenAI) | $1.25 | $0.5257 | $0.0526 | $0.4731 |
| Claude Sonnet 4.6 (Anthropic) | $3.00 | $1.2617 | $0.1262 | $1.1355 |
| Gemini 2.5 Pro (Google) | $1.25 | $0.5257 | $0.0526 | $0.4731 |
| Gemini 2.5 Flash (Google) | $0.30 | $0.1262 | $0.0126 | $0.1136 |
📊 Monthly example (10,000 queries/day × 30 days): $340,650/month net savings · Claude Sonnet 4.6 · conservative estimate
Methodology & Reproducibility
| Step | Tool | Description |
|---|---|---|
| Code parsing | Python built-in ast module | Deterministic, no LLM required, 100% reproducible |
| Token counting | OpenAI tiktoken cl100k_base | Industry-standard tokenizer |
| Doc extraction | Google Gemini 2.0 Flash Lite | Semantic key-concept extraction |
| Query simulation | BFS 2-hop subgraph | Identical to production /v1/query API behavior |
| Compression ratio | Conservative (500 tok/node) | Includes realistic code snippet context |
Run it yourself
pip install tiktoken networkx # Get a free key at aistudio.google.com/apikey $env:GEMINI_API_KEY = "AIza..." python benchmark/run_benchmark_v3.py
Fully reproducible with no paid APIs. Runs entirely on the Gemini free tier.