INDEPENDENTLY REPRODUCIBLE

Performance
Benchmark Report

Real measurements on public GitHub repositories. Conservative estimates. Fully reproducible by anyone.

Generated: May 3, 2026 · Method: Python ast + tiktoken cl100k_base + BFS subgraph

Small

8.6x

conservative compression

HTTP Client Library

Medium

35.2x

conservative compression

Web Framework

Large

20.2x

conservative compression

Enterprise Web Framework

How SurfClaw Works

SurfClaw converts your selected codebase or document corpus into a knowledge graph, then extracts only the nodes relevant to each query using BFS (breadth-first search, 2 hops), sending only those snippets to the LLM.

❌ Without SurfClaw

Entire raw source files → LLM
(hundreds of thousands of tokens)

✅ With SurfClaw

Relevant nodes only → LLM
(thousands of tokens)

Conservative estimation: nodes extracted per query × 500 tokens (realistic cost including actual code snippets)

Results Overview

Codebase	Files	Full Tokens (baseline)	Tokens/Query (SurfClaw)	Conservative Ratio
SmallHTTP Client Library	90	182,774	21,200	8.6x
MediumWeb Framework	360	475,208	13,500	35.2x
LargeEnterprise Web Framework	360	442,463	21,900	20.2x

Small

encode/httpx

HTTP Client Library

8.6x

conservative

1,258

Graph Nodes

Avg Nodes/Query

81.7x

Graph JSON Ratio

Cost savings per query by model

Model	$/1M Tokens	Savings/Query	SurfClaw Fee	Customer Net
GPT-4.1 (OpenAI)	$2.00	$0.3231	$0.0323	$0.2908
GPT-5 (OpenAI)	$1.25	$0.2020	$0.0202	$0.1818
Claude Sonnet 4.6 (Anthropic)	$3.00	$0.4847	$0.0485	$0.4362
Gemini 2.5 Pro (Google)	$1.25	$0.2020	$0.0202	$0.1818
Gemini 2.5 Flash (Google)	$0.30	$0.0485	$0.0048	$0.0436

📊 Monthly example (10,000 queries/day × 30 days): $130,860/month net savings · Claude Sonnet 4.6 · conservative estimate

Medium

tiangolo/fastapi

Web Framework

35.2x

conservative

2,592

Graph Nodes

Avg Nodes/Query

393.6x

Graph JSON Ratio

Cost savings per query by model

Model	$/1M Tokens	Savings/Query	SurfClaw Fee	Customer Net
GPT-4.1 (OpenAI)	$2.00	$0.9234	$0.0923	$0.8311
GPT-5 (OpenAI)	$1.25	$0.5771	$0.0577	$0.5194
Claude Sonnet 4.6 (Anthropic)	$3.00	$1.3851	$0.1385	$1.2466
Gemini 2.5 Pro (Google)	$1.25	$0.5771	$0.0577	$0.5194
Gemini 2.5 Flash (Google)	$0.30	$0.1385	$0.0139	$0.1247

📊 Monthly example (10,000 queries/day × 30 days): $373,980/month net savings · Claude Sonnet 4.6 · conservative estimate

Large

django/django

Enterprise Web Framework

20.2x

conservative

3,078

Graph Nodes

Avg Nodes/Query

227.0x

Graph JSON Ratio

Cost savings per query by model

Model	$/1M Tokens	Savings/Query	SurfClaw Fee	Customer Net
GPT-4.1 (OpenAI)	$2.00	$0.8411	$0.0841	$0.7570
GPT-5 (OpenAI)	$1.25	$0.5257	$0.0526	$0.4731
Claude Sonnet 4.6 (Anthropic)	$3.00	$1.2617	$0.1262	$1.1355
Gemini 2.5 Pro (Google)	$1.25	$0.5257	$0.0526	$0.4731
Gemini 2.5 Flash (Google)	$0.30	$0.1262	$0.0126	$0.1136

📊 Monthly example (10,000 queries/day × 30 days): $340,650/month net savings · Claude Sonnet 4.6 · conservative estimate

Methodology & Reproducibility

Step	Tool	Description
Code parsing	Python built-in ast module	Deterministic, no LLM required, 100% reproducible
Token counting	OpenAI tiktoken cl100k_base	Industry-standard tokenizer
Doc extraction	Google Gemini 2.0 Flash Lite	Semantic key-concept extraction
Query simulation	BFS 2-hop subgraph	Identical to production /v1/query API behavior
Compression ratio	Conservative (500 tok/node)	Includes realistic code snippet context

Run it yourself

pip install tiktoken networkx
# Get a free key at aistudio.google.com/apikey
$env:GEMINI_API_KEY = "AIza..."
python benchmark/run_benchmark_v3.py

Fully reproducible with no paid APIs. Runs entirely on the Gemini free tier.

PerformanceBenchmark Report

Results Overview

Methodology & Reproducibility

Performance
Benchmark Report