Minitron by NVlabs

Compressed language models via pruning/distillation

Created 1 year ago

362 stars

Top 77.6% on SourcePulse

2 Experts Love This Project

hammer

Jeff Hammerbacher

Cofounder of Cloudera

shizhediao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

Minitron is a family of compressed small language models (SLMs) derived from larger models through pruning and knowledge distillation. It targets researchers and developers seeking efficient, high-performance language models with reduced computational requirements, offering state-of-the-art accuracy for their size.

How It Works

Minitron models are created by first pruning embedding size, attention heads, and MLP intermediate dimensions from a base model. This is followed by continued training with knowledge distillation. This approach significantly reduces training costs (up to 40x fewer tokens) and results in models that outperform other compression techniques and achieve competitive accuracy against larger, uncompressed models.

Quick Start & Requirements

Hugging Face: Models are available on Hugging Face (e.g., Mistral-NeMo-Minitron-8B-Base). Usage instructions are in model cards.
NeMo Container: For TensorRT-LLM export, use the nvcr.io/nvidia/nemo:24.05 container. Requires mounting TensorRT-Model-Optimizer and model directories.
Dependencies: Python, PyTorch, Hugging Face transformers, nvidia-modelopt, TensorRT-LLM. GPU with CUDA is recommended for inference and export.
Resources: Exporting to TensorRT-LLM involves Docker builds and model conversion, requiring significant disk space and compute.

Highlighted Details

Achieves SOTA 8B model performance using only 400B tokens.
Offers up to 16% MMLU improvement over training from scratch.
Models are available in .nemo checkpoint format for NeMo and can be exported to TensorRT-LLM.
Supports fine-tuning with frameworks like LMFlow, including LoRA and LISA.

Maintenance & Community

Developed by NVIDIA (NVlabs).
Models are available on Hugging Face, with community-quantized FP8 versions also provided.
Technical report and blog posts detail the methodology and results.

Licensing & Compatibility

Released under the NVIDIA Open Model License Agreement. Specific terms should be reviewed for commercial use.

Limitations & Caveats

The NVIDIA Open Model License Agreement may have restrictions on commercial use.
Exporting to TensorRT-LLM requires specific NVIDIA container environments and can be complex.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

3 stars in the last 30 days

Explore Similar Projects

awesome-deep-phenomena by MinghuiChen43

Curated list of deep learning empirical studies and insights

Created 5 years ago

Updated 4 days ago

LLMPruner by yangjianxin1

LLM pruning tool for reducing model size and accelerating training

Created 2 years ago

Updated 2 years ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind),

Gabriel Almeida

Gabriel Almeida(Cofounder of Langflow), and

1 more.

distill-sd by segmind

Diffusion model distillation for smaller, faster Stable Diffusion

Created 2 years ago

Updated 2 years ago

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier),

Alexander Wettig

Alexander Wettig(Coauthor of SWE-bench, SWE-agent), and

4 more.

LLM-Shearing by princeton-nlp

Code for LLM pre-training acceleration via structured pruning (ICLR 2024)

Created 2 years ago

Updated 1 year ago

Starred by

Simon Willison

Simon Willison(Coauthor of Django),

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect), and

4 more.

OpenMoE by XueFuzhao

Open-source MoE LLM for research

Created 2 years ago

Updated 1 year ago

mini_qwen by qiufengqijun

LLM project for training a large language model from scratch

Created 11 months ago

Updated 10 months ago

tiny-llm-zh by wdndev

Chinese LLM for learning large language models

Created 1 year ago

Updated 1 year ago

Efficient-Deep-Learning by MingSun-Tse

DNN efficiency methods collection (neural compression, acceleration)

Created 7 years ago

Updated 9 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

Awesome-Efficient-LLM by horseee

Curated list for efficient LLMs

Created 2 years ago

Updated 6 months ago

EasyTransfer by alibaba

NLP platform for transfer learning

Created 5 years ago

Updated 3 years ago

bert4torch by Tongjilibo

PyTorch library for transformer models

Created 3 years ago

Updated 2 weeks ago

Starred by

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake),

Edward Sun

Edward Sun(Research Scientist at Meta Superintelligence Lab), and

1 more.

awesome-knowledge-distillation by dkozlov

Collection of knowledge distillation resources

Created 8 years ago

Updated 2 weeks ago

Feedback? Help us improve.