SMEARGLE - Dylan Norquist

A research project on speculative decoding: the trick of running a small "draft" model alongside a large "target" model so the draft can propose several tokens at a time, then have the target verify them in a single forward pass. When the draft is right, you get tokens almost for free.

SMEARGLE is a model-based variant of that idea, and on our benchmarks it surpasses EAGLE3, the previous state of the art. It took 2nd place at the MSOE ROSIE Supercomputer Super Challenge 2026, and we presented preliminary results at MICS 2026.

A full write-up will land here once the paper is submitted. Until then, the GitHub repo has the code, the training scripts, and the benchmark numbers if you want to dig in.