Glenn K. Lockwood
@glennklockwood@mast.hpc.social
I am a supercomputing enthusiast, but I usually don't know what I'm talking about. I post about large-scale infrastructure for # HPC and # AI .
mast.hpc.social
Glenn K. Lockwood
@glennklockwood@mast.hpc.social
I am a supercomputing enthusiast, but I usually don't know what I'm talking about. I post about large-scale infrastructure for # HPC and # AI .
mast.hpc.social
@glennklockwood@mast.hpc.social
·
1d ago
Speculative decode is an inferencing optimization that was mentioned a few times at #GTC26. I'd heard of it but didn't know how it worked, so I spent some time figuring it out. Some notes (and toy code that illustrates its benefits!) are here: https://glennklockwood.com/garden/speculative-decode
6
0
4