Using speculative decoding with something like Llama 3.1 70B as the draft model, you'd need another 140GB of memory on top of the 810, but, in theory could achieve generation rates well over 100 ...
Some results have been hidden because they may be inaccessible to you