Continual learning: how a small model keeps pace with the frontier
The frontier moves. A small model you fine-tuned six months ago is frozen in time, while the frontier has shipped two or three new releases since. The model that was competitive at launch is, within a quarter, visibly a step behind, not because it got worse but because the bar moved.
That leaves most teams choosing between two bad options. Pay frontier prices on every call forever, and watch the bill scale with usage. Or freeze a cheap, fine-tuned small model and accept that it quietly drifts behind the state of the art. Neither is a strategy; one is expensive and one expires.
Continual learning is the third option
The way to keep a small model on the frontier’s heels is to never stop training it. The frontier becomes a moving teacher: on the queries that matter, the current best model supervises the small one, so every time the frontier improves, that improvement has a path back into your weights. Production traffic supplies fresh, outcome-labeled signal, and a scheduled update folds both into the small model so it tracks the frontier’s gains without ever paying the frontier’s per-call price at scale.
The difference shows up over time. A static small model can even start ahead of a general frontier on a narrow task, then fall behind as the frontier climbs and its own world drifts. A continually-learned model rides just under the frontier instead, closing the gap each cycle rather than widening it.
Why a snapshot decays
Two forces pull a frozen model down at once. The first is distribution drift: your product, your users, and the kinds of requests they send all move, and a model trained on last quarter’s traffic answers last quarter’s questions. The second is the rising bar: even if your traffic never changed, “good enough” recalibrates upward every time the frontier ships. A snapshot loses ground to both, and prompting cannot fix either, because the gap is in the weights, not the context.
The economics still favor the small model
Continual learning does not mean paying the frontier for everything. The small model still serves the routine majority at a fraction of the cost; the frontier is reserved for two narrow jobs, teaching the small model and catching the genuinely hard cases it routes up. The teaching cost is amortized across thousands of cheap inferences, so the effective price per request stays close to the small model’s, not the frontier’s.
The point
Keeping up with frontier LLMs and keeping costs low are usually framed as a trade-off: pick quality or pick price. Continual learning is how you stop picking. A small model that learns from the frontier and from its own production keeps pace with the state of the art while charging like a small model, because the one thing it never does is stand still.