Discover and explore top open-source AI tools and projects—updated daily.
facebookincubatorTelemetry daemon for performance monitoring and tracing of heterogeneous CPU-GPU systems
Top 78.1% on SourcePulse
Dynolog is a telemetry daemon designed for comprehensive performance monitoring and tracing across heterogeneous CPU-GPU systems, primarily targeting large-scale AI training workloads. It provides a unified view of system performance by collecting metrics from the Linux kernel, CPUs, GPUs (NVIDIA via DCGM), and integrates with PyTorch for on-demand distributed tracing, simplifying bottleneck identification in complex AI environments.
How It Works
Dynolog operates as a daemon that continuously collects system-level metrics and can be remotely triggered for deep-dive profiling. It leverages Linux perf_event for CPU micro-architectural counters, NVIDIA's DCGM for GPU metrics, and integrates with the PyTorch profiler via an IPC monitor. This approach allows for both always-on monitoring and granular, application-specific tracing, offering a holistic performance picture by correlating hardware events with application behavior.
Quick Start & Requirements
cmake, ninja, and cargo/rustup.Highlighted Details
perf_event (cache, TLB, etc.).Maintenance & Community
Actively maintained by Meta engineers. Community interaction and bug reporting via GitHub Issues.
Licensing & Compatibility
MIT License. Permissive for commercial use and integration with closed-source applications.
Limitations & Caveats
Currently supports only Linux platforms and NVIDIA GPUs. Intel Processor Trace and memory monitoring are under active development. Some userspace features may have limitations without root access.
1 week ago
1+ week
leptonai
aimhubio
determined-ai
ztxz16
gpustack
mosaicml