Discover and explore top open-source AI tools and projects—updated daily.
facebookincubatorTelemetry daemon for performance monitoring and tracing of heterogeneous CPU-GPU systems
Top 79.3% on SourcePulse
Dynolog is a telemetry daemon designed for comprehensive performance monitoring and tracing across heterogeneous CPU-GPU systems, primarily targeting large-scale AI training workloads. It provides a unified view of system performance by collecting metrics from the Linux kernel, CPUs, GPUs (NVIDIA via DCGM), and integrates with PyTorch for on-demand distributed tracing, simplifying bottleneck identification in complex AI environments.
How It Works
Dynolog operates as a daemon that continuously collects system-level metrics and can be remotely triggered for deep-dive profiling. It leverages Linux perf_event for CPU micro-architectural counters, NVIDIA's DCGM for GPU metrics, and integrates with the PyTorch profiler via an IPC monitor. This approach allows for both always-on monitoring and granular, application-specific tracing, offering a holistic performance picture by correlating hardware events with application behavior.
Quick Start & Requirements
cmake, ninja, and cargo/rustup.Highlighted Details
perf_event (cache, TLB, etc.).Maintenance & Community
Actively maintained by Meta engineers. Community interaction and bug reporting via GitHub Issues.
Licensing & Compatibility
MIT License. Permissive for commercial use and integration with closed-source applications.
Limitations & Caveats
Currently supports only Linux platforms and NVIDIA GPUs. Intel Processor Trace and memory monitoring are under active development. Some userspace features may have limitations without root access.
18 hours ago
1+ week
microsoft
leptonai
aimhubio
determined-ai
gpustack
ztxz16
mosaicml