INDEPENDENTLY REPRODUCIBLE

Performance
Benchmark Report

Real measurements on public GitHub repositories. Conservative estimates. Fully reproducible by anyone.

Generated: May 3, 2026  ·  Method: Python ast + tiktoken cl100k_base + BFS subgraph

Small
8.6x
conservative compression
HTTP Client Library
Medium
35.2x
conservative compression
Web Framework
Large
20.2x
conservative compression
Enterprise Web Framework

How SurfClaw Works

SurfClaw converts your selected codebase or document corpus into a knowledge graph, then extracts only the nodes relevant to each query using BFS (breadth-first search, 2 hops), sending only those snippets to the LLM.

❌ Without SurfClaw
Entire raw source files → LLM
(hundreds of thousands of tokens)
✅ With SurfClaw
Relevant nodes only → LLM
(thousands of tokens)

Conservative estimation: nodes extracted per query × 500 tokens (realistic cost including actual code snippets)

Results Overview

CodebaseFilesFull Tokens (baseline)Tokens/Query (SurfClaw)Conservative Ratio
SmallHTTP Client Library90182,77421,2008.6x
MediumWeb Framework360475,20813,50035.2x
LargeEnterprise Web Framework360442,46321,90020.2x
Small
encode/httpx
HTTP Client Library
8.6x
conservative
1,258
Graph Nodes
42
Avg Nodes/Query
81.7x
Graph JSON Ratio

Cost savings per query by model

Model$/1M TokensSavings/QuerySurfClaw FeeCustomer Net
GPT-4.1 (OpenAI)$2.00$0.3231$0.0323$0.2908
GPT-5 (OpenAI)$1.25$0.2020$0.0202$0.1818
Claude Sonnet 4.6 (Anthropic)$3.00$0.4847$0.0485$0.4362
Gemini 2.5 Pro (Google)$1.25$0.2020$0.0202$0.1818
Gemini 2.5 Flash (Google)$0.30$0.0485$0.0048$0.0436
📊 Monthly example (10,000 queries/day × 30 days): $130,860/month net savings · Claude Sonnet 4.6 · conservative estimate
Medium
tiangolo/fastapi
Web Framework
35.2x
conservative
2,592
Graph Nodes
27
Avg Nodes/Query
393.6x
Graph JSON Ratio

Cost savings per query by model

Model$/1M TokensSavings/QuerySurfClaw FeeCustomer Net
GPT-4.1 (OpenAI)$2.00$0.9234$0.0923$0.8311
GPT-5 (OpenAI)$1.25$0.5771$0.0577$0.5194
Claude Sonnet 4.6 (Anthropic)$3.00$1.3851$0.1385$1.2466
Gemini 2.5 Pro (Google)$1.25$0.5771$0.0577$0.5194
Gemini 2.5 Flash (Google)$0.30$0.1385$0.0139$0.1247
📊 Monthly example (10,000 queries/day × 30 days): $373,980/month net savings · Claude Sonnet 4.6 · conservative estimate
Large
django/django
Enterprise Web Framework
20.2x
conservative
3,078
Graph Nodes
44
Avg Nodes/Query
227.0x
Graph JSON Ratio

Cost savings per query by model

Model$/1M TokensSavings/QuerySurfClaw FeeCustomer Net
GPT-4.1 (OpenAI)$2.00$0.8411$0.0841$0.7570
GPT-5 (OpenAI)$1.25$0.5257$0.0526$0.4731
Claude Sonnet 4.6 (Anthropic)$3.00$1.2617$0.1262$1.1355
Gemini 2.5 Pro (Google)$1.25$0.5257$0.0526$0.4731
Gemini 2.5 Flash (Google)$0.30$0.1262$0.0126$0.1136
📊 Monthly example (10,000 queries/day × 30 days): $340,650/month net savings · Claude Sonnet 4.6 · conservative estimate

Methodology & Reproducibility

StepToolDescription
Code parsingPython built-in ast moduleDeterministic, no LLM required, 100% reproducible
Token countingOpenAI tiktoken cl100k_baseIndustry-standard tokenizer
Doc extractionGoogle Gemini 2.0 Flash LiteSemantic key-concept extraction
Query simulationBFS 2-hop subgraphIdentical to production /v1/query API behavior
Compression ratioConservative (500 tok/node)Includes realistic code snippet context

Run it yourself

pip install tiktoken networkx
# Get a free key at aistudio.google.com/apikey
$env:GEMINI_API_KEY = "AIza..."
python benchmark/run_benchmark_v3.py
Fully reproducible with no paid APIs. Runs entirely on the Gemini free tier.