Using speculative decoding with something like Llama 3.1 70B as the draft model, you'd need another 140GB of memory on top of ...
That's unremarkable, given the long history of factory automation and industrial robots. But just as mainframes gave way to ...