Speculative decode is an inferencing optimization that was mentioned a few times at . I'd heard of it but didn't know how it worked, so I spent some time figuring it out. Some notes (and toy code that illustrates its benefits!) are here: https://glennklockwood.com/garden/speculative-decode