11-Second Time to First Token on a Healthy vLLM Server
A healthy vLLM server, normal nvidia-smi output, 11 seconds to first token. eBPF uprobes on the CUDA driver and kernel tracepoints on the scheduler traced it to prefix caching head-of-line blocking.


