nvidia-smi Reports 97% Utilization While the GPU Sits Idle
97% GPU utilization in nvidia-smi, but training throughput was a fraction of what benchmarks promised. CUDA API tracing and kernel scheduling data showed what the GPU was actually doing during that 97%.

