← Back to Konde Blog
ENGINEERING

How we cut Claude Code token usage by 60% with RTK

We were burning $40/day on Claude Code at internal scale. RTK — a small Rust proxy that intercepts the most expensive tool calls and rewrites them — cut that to $14/day with no behavioural change. Here is how it works and why it is open source now.

Three months ago, Konde was burning roughly $40 a day on Claude Code across our team and our agent fleet. That sounds modest — and it is, at our scale — but the trend line was steep. As we onboarded more agents and scaled up automated workflows, we projected $400/day by mid-year. At that point you stop calling it a tooling cost and start calling it a real budget line.

So we instrumented. We measured. We built a proxy. The result is RTK — Rust Token Killer — and it cut our daily Claude Code spend by 60% with no observable behaviour change. Today we are open-sourcing it.

Where the tokens were actually going

The first surprise: the tokens were not going where we thought.

We assumed the bulk of cost came from large file reads, long agent reasoning chains, or rare giant context windows. Wrong on all counts. When we instrumented the wire, the picture was:

  • 42% of input tokens: tool descriptions sent in every system prompt.
  • 23% of input tokens: re-sending file contents we had just sent two turns ago.
  • 18% of input tokens: verbose git and grep output that the agent only needed two lines of.
  • 17% everything else.

The first category is the killer. Every Claude Code session opens with the system prompt enumerating every available tool — every Bash variant, every Read flag, every Edit option. That is roughly 40,000 tokens at session start, sent on every turn. If your agent is ten turns deep, you are sending 400k tokens of tool descriptions for the same set of tools.

This is fixable.

The fix — selective tool definition

RTK sits as a transparent proxy between Claude Code and the Anthropic API. It does three things:

1. Lazy tool definition

Most agent sessions only use a handful of tools. RTK starts the session with a minimal tool descriptor — just the names — and ships the full schemas only for the tools the agent actually invokes. When the agent calls a tool that has not been defined yet, RTK fetches the definition mid-stream and resends the system prompt once.

Net effect: 80% of sessions never load most tool schemas, because they never use most tools. Average input-token saving on this single optimisation: ~30%.

2. Output filtering on noisy commands

A git status output averages 800 tokens. The information-dense part — which files changed — is usually 30 tokens. RTK has a small filter library (~80 commands) that knows which parts of common shell output the agent actually reads. git, npm, docker, kubectl, ls -la, grep with surrounding context — all get cropped to their useful core before being fed back to the model.

Saving here: ~15% of input tokens, averaged across our workload.

3. File diff caching

When an agent reads a file, edits it, and reads it again three turns later, the second read is wasteful. RTK keeps a per-session file cache and serves the second read as a diff against the first — 200 tokens instead of 4,000.

Saving here: ~12% on file-read-heavy sessions.

Combined effect across our fleet: 60% reduction in input tokens. Output tokens are unchanged (we do not touch what the model emits, only what it sees).

What about correctness?

Here is the question every engineer asks first: does this break things?

We measured. Across 10,000 agent sessions, RTK-on vs RTK-off, we saw no statistically significant difference in:

  • Task completion rate
  • Number of turns per task
  • Generated code passing CI
  • Agent self-reported confidence

The tool descriptions Claude was getting were redundant — the model had memorised the API surface from training, so re-sending the schemas every turn was paying for information the model already had. The output filtering preserved the parts the model actually reads. The diff caching preserved correctness because the model can reconstruct the full file from a diff trivially.

We did break one thing: a janky internal workflow that depended on Claude re-reading a 200-line config and "noticing" a hidden flag we had not asked about. RTK's diff caching collapsed that to "no change since last read," and Claude no longer noticed the flag. The fix was to write the workflow correctly (have the agent look for the flag explicitly). RTK exposed the bad workflow rather than created the bug.

What it is built in and why

RTK is Rust. We picked Rust for three reasons:

  1. Zero-cost streaming. RTK is a streaming proxy — it has to operate on the bytes flowing between Claude Code and the API in real time, without buffering the whole response. Rust's async story is mature and the throughput numbers held up.
  2. Small binary. ~3.2MB compiled. Easy to ship, easy to embed.
  3. No GC pauses. A GC pause in a streaming proxy creates user-visible jitter. Rust avoids it entirely.

Total LOC: ~4,200. Tested against Anthropic's official SDK, the official Claude Code CLI, and our internal agent runtime.

How to use it

RTK is on GitHub at konde/rtk. Install via cargo install rtk or download a prebuilt binary. Configure your shell to wrap Claude Code via the included hook, and every command you run gets the optimisation transparently.

brew tap konde/rtk
brew install rtk
rtk --version

The hook integration adds about 0.3ms to each Claude Code invocation. You will not notice it. Your wallet will.

Why open-source it

Two reasons. First, we built RTK on top of Anthropic's openly documented API; it would feel wrong to capture the savings ourselves and keep the technique closed. Second, the more people use it, the more edge cases we will discover and fix — and the better Konde Studio will get for free.

If you ship RTK at your shop, please tell us. We have a #rtk-users channel in the Konde Discord and a small monthly digest of optimisations the community has contributed. Every addition makes the rest of the fleet cheaper.

Share X LinkedIn Facebook