GPU Causal Observability — Open-Source eBPF Agent

Open source GPU observability, traced end to end. An eBPF agent that follows the full chain — Linux kernel through CUDA API to your Python source lines. Find out why your GPU is slow, not just that it is.


Open Source GPU Observability: What It Does

Ingero attaches eBPF uprobes to your running CUDA processes — no SDK, no code changes, no restart. It traces GPU API calls, correlates them with host kernel events, and outputs causal chains explaining root causes.

NEW (May 2026): v0.10.1 -> Agent v0.16.0 + Fleet v1.0. 0: Multiple new features and fixes shipped. Read more…

NEW (April 2026): We shipped Ingero Fleet (OTEL Collector) for cluster-wide GPU observability across all nodes, multi-node investigations, CUDA Graphs

CUDA Runtime + Driver

14 interception points on libcudart.so and libcuda.so. Traces cudaMalloc, cudaLaunchKernel, cudaMemcpy, cudaStreamSync, cuLaunchKernel, and more. Sees the kernel launches that cuBLAS/cuDNN make directly.

Host Kernel Tracepoints

CPU scheduling (sched_switch, sched_wakeup), memory pressure (mm_page_alloc, oom_kill), process lifecycle, block I/O, TCP retransmits, network socket I/O. 6 eBPF sensors total.

Causal Engine

Correlates events across layers by timestamp and PID. Outputs root cause chains with severity ranking and fix recommendations. Processes 24K+ events/sec through a 7-tier selective filter.


Quick Start: Ingero eBPF Agent for GPU Observability

Binary release, recommended:

# Linux amd64
VERSION=0.16.0
curl -fsSL "https://github.com/ingero-io/ingero/releases/download/v${VERSION}/ingero_${VERSION}_linux_amd64.tar.gz" | tar xz
sudo mv ingero /usr/local/bin/

# Linux arm64 (GH200 / Grace Hopper, Graviton)
VERSION=0.16.0
curl -fsSL "https://github.com/ingero-io/ingero/releases/download/v${VERSION}/ingero_${VERSION}_linux_arm64.tar.gz" | tar xz
sudo mv ingero /usr/local/bin/

Docker image is also available. Or build from source:

git clone https://github.com/ingero-io/ingero.git  # clone the repo
cd ingero
bash scripts/install-deps.sh   # install dependencies: clang-14, go, ebpf chain
source ~/.bashrc               # update your env
make

# [optional] Check your system
./bin/ingero check

# [optional] Try Ingero (auto-detects GPU)
./bin/ingero demo

# Trace live CUDA workloads and CPU-GPU interactions (requires sudo for eBPF access + NVIDIA GPU)
sudo ./bin/ingero trace

Single binary. No dependencies. Works on any Linux 5.15+ kernel with NVIDIA driver 550+. Also available as a K8s DaemonSet.


Embedded MCP Server: Let AI Agents Investigate AI Training and Inference Issues

Ingero Agent includes an MCP server with 7 tools. Connect it to Claude, Cursor, Ollama + Qwen/MiniMax or any MCP-compatible assistant / model, local or remote, and ask questions about your GPU workloads, both at a single node and cluster-wide via Ingero Fleet & Echo. Use built-in /investigate prompt to call multiple MCP tools Ingero provides to analyze collected runtime data and causal chains.

Engineer: "What caused the training slowdown?"

Ingero MCP → cudaStreamSync p99 spiked 29x (16ms → 472ms).

Root cause: 847 sched_switch events — logrotate preempted
training thread for 142ms cumulative off-CPU time.

Source: forward() at train.py:142

Full AI investigation session →


Ingero Agent’s GPU Observability Architecture

┌────────────────────────────────────────────────────────────────┐
│  User Space                                                    │
│                                                                │
│  ┌─────────┐    ┌─────────────┐  ┌───────┐    ┌─────────────┐  │
│  │  CUDA   │    │   ingero    │  │SQLite │    │MCP Server   │  │
│  │  App    │    │   agent     │─►│  DB   │◄───│(stdio/HTTPS)│  │
│  │(PyTorch)│    │             │  │       │    └─────────────┘  │
│  │         │    │             │  │       │   ┌───────────┐     │
│  │         │    │             │  │       │◄──│ Dashboard │     │
│  │         │    │             │  └───────┘   │  (HTTPS)  │     │
│  └──┬──┬───┘    │ ┌──────────┐│              └───────────┘     │
│     │  │        │ │ causal   ││   ┌───────────┐                │
│     │  │        │ │ engine   ││   │ OTLP /    │                │
│     │  │        │ └──────────┘│──►│ Prometheus│                │
│     │  │        └──┬──┬──┬────┘   └───────────┘                │
│     │  │           │  │  │ ▲                                   │
│     │  │           │  │  │ │ ring buffers                      │
│─────┼──┼───────────┼──┼──┼─┼───────────────────────────────────│
│     │  ▼           │  ▼  ▼ │                                   │
│     │ ┌─────────┐  │ ┌────────────────────┐                    │
│     │ │libcuda  │◄─┤ │  eBPF uprobes      │  (Driver API)      │
│     │ │  .so    │  │ │  cuLaunchKernel    │                    │
│     │ └─────────┘  │ │  cuMemcpy/Alloc    │                    │
│     ▼              │ └────────────────────┘                    │
│  ┌─────────┐       │ ┌────────────────────┐                    │
│  │libcudart│◄──────┘ │  eBPF uprobes      │  (Runtime API)     │
│  │  .so    │◄────────│  cudaLaunchKernel  │                    │
│  └─────────┘         │  cudaMalloc/Memcpy │                    │
│                      │  Graph: Capture,   │                    │
│                      │  Instantiate,Launch│                    │
│                      └────────────────────┘                    │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  eBPF tracepoints (sched_switch, mm_page_alloc, oom,    │   │
│  │  sched_process_exec/exit/fork)                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                │
│  Kernel Space        /proc → CPU%, Mem%, Load, Swap            │
└────────────────────────────────────────────────────────────────┘

< 2% production overhead. Selective storage (not “store everything”) — 100% accuracy on the live stream, ~1% volume to disk. Local SQLite, size-bounded at 10GB default. No cloud backend required. For multi-node / cluster architecture please refer to Ingero Fleet.


From the Blog


Get Involved

Ingero Agent (single node) and Ingero Fleet (OTEL Collector for the multi-node / cluster) are free and open source, dual licensed with Apache 2.0 (Go agent) and GPL-2.0 (eBPF kernel programs).

Contributions welcome!

GitHub

Source, issues, discussions → github.com/ingero-io/ingero and github.com/ingero-io/ingero-fleet

Docs

Setup, architecture, test matrix → github.com/ingero-io/ingero/docs

Scroll to Top