A concept brief, not a hands-on. I read the architecture and the code; I haven't run llm-d at scale, because the realistic use case needs multi-GPU infrastructure. This is what I understood, and why ...