Research
Reasoning is cheap. Awareness isn't.
Model tiers buy reasoning depth. The context layer buys sight. We measured what happens when a small model finally gets to see the whole change.
Rhei Team4 min read
The question
Frontier reasoning is getting cheaper and more automated by the month. So the interesting question for a coding agent is no longer how smart the model is. It is what the model can see. Can a cheap model do most of the work on a real codebase if the context layer gives it sight? We wanted a real answer, so we ran the work in our own monorepo. Port a Rust module to TypeScript with parity tests. Adopt the ported module in a caller path. Consolidate key formats that had drifted apart. Prove that hashes match across the two languages. These tasks have ground truth. The work either lands or it does not.
What we measured first
We already had one public number. Against RepoPrompt on 15 replayed review sessions, Rhei matched quality on 15 of 15 while using 51.6 percent fewer tokens, 89.6 percent fewer tool calls, and finishing 33.5 percent faster.
That number says the context engine is efficient. It does not say whether efficiency is the same thing as sight. So we ran the harder question: what actually moves quality on a small model.
Reasoning effort didn’t move the needle
The efficiency above is the headline, but it raised a sharper question: if Rhei reaches the same quality for far less, does spending more on reasoning buy anything back? So we ran the same tasks with the model at low reasoning effort and again at high, Rhei feeding both the same context. The outcomes landed in the same range either way. Turning reasoning up did not produce better patches. Once the model could see the right files, more thinking added little.
| Setup | Patch quality | Sample |
|---|---|---|
| Low reasoning effort + Rhei | ~0.64 | 12 reps |
| High reasoning effort + Rhei | ~0.64 | 12 reps |
Same range either way — quality swung more from one run to the next than the effort dial moved it. With context in place, deeper reasoning bought little. One repo, small samples.
Where context is still the work
Cost and reasoning are the solved part. Coverage is not. On greppable, single-surface tasks, finding the right files is easy for anything. The hard case is a wide, ambiguous change — a migration touching files no single search obviously connects. There the risk is not a low score. It is a change that ships quietly half done, with the agent confident it finished.
Not knowing what it did not find is the failure mode no amount of model quality self-detects. That gap is what we are building Rhei to close, and it is where we are still measuring.
What it means
Model tiers buy reasoning depth. The context layer buys sight. A small model that sees the right files behaves like a bigger one, and a big model that cannot see the edges of a change is still guessing.
This is why we think awareness, not raw reasoning, is becoming the bottleneck. As models take over more of the execution, depth of thought gets cheaper and more automated, and the constraint moves to what the agent is aware of: the files it must touch, the call sites it must not miss, the prior work it should reuse. Reasoning is cheap. Awareness isn’t.
The numbers here are small: a handful of tasks, one repository, single runs unless noted. We publish them as a direction, not a verdict, because the direction is already clear. Reasoning is getting cheap, and what the model can see is what is left to win.
