Discover and explore top open-source AI tools and projects—updated daily.
NVIDIA-NeMoOpen-source library for scalable, reproducible AI model and benchmark evaluation
Top 99.3% on SourcePulse
Summary
NVIDIA-NeMo/Evaluator is an open-source SDK for scalable, reproducible AI model and benchmark evaluation. It targets researchers and engineers needing to rigorously assess LLMs against numerous benchmarks, offering a unified CLI, pluggable architecture, and containerized execution for auditable results. The platform simplifies integrating public benchmarks and private datasets for efficient model comparison.
How It Works
The system uses two components: the nemo-evaluator core engine and the nemo-evaluator-launcher CLI. Evaluations run in open-source Docker containers, ensuring reproducibility by capturing configurations, seeds, and provenance. This containerized, pluggable architecture scales evaluations from local machines to Slurm or cloud backends (e.g., Lepton AI) without workflow changes, simplifying integration and ensuring auditable results.
Quick Start & Requirements
pip install nemo-evaluator-launcher.export NGC_API_KEY=<YOUR_API_KEY>).nemo-evaluator-launcher run --config <path_to_config.yaml> -o execution.output_dir=<YOUR_OUTPUT_LOCAL_DIR>. Example configs are in the repo.Highlighted Details
Maintenance & Community
Contributions are welcomed via the Contribution Guide. Discussions are on GitHub Discussions. Anonymous telemetry is collected for project improvement, with opt-out options.
Licensing & Compatibility
Licensed under the Apache License 2.0, permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
The nel ls command may require manual Docker authentication and lacks support for macOS Keychain or GNOME Keyring credential management. A preview of v0.3.0 on the dev/0.3.0 branch indicates ongoing development.
1 day ago
Inactive
JinjieNi
braintrustdata
mlfoundations
groq
TheAgentCompany