GPU Utilization Is a Counter, Not a Cause
nvidia-smi reads 97% while throughput falls 3x in the same window. GPU utilization is a duty-cycle counter, not a measure of useful work. The cause-side data lives one layer down: kernel-runtime spreads, off-CPU time on the dispatcher thread, NCCL waits, I/O stalls.



