A concept brief, not a hands-on. I read the architecture and the code; I haven't run llm-d at scale, because the realistic use case needs multi-GPU infrastructure. This is what I understood, and why ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results