CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation
CUDA OOM at 60% GPU utilization. nvidia-smi showed plenty of free memory, but PyTorch kept crashing. eBPF tracing of every cudaMalloc and cudaFree call exposed the real cause: memory fragmentation from misaligned allocation patterns.

