site:go.theregister.com

Cheat codes for LLM performance: An introduction to speculative decoding

Using speculative decoding with something like Llama 3.1 70B as the draft model, you'd need another 140GB of memory on top of the 810, but, in theory could achieve generation rates well over 100 ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

Trending now