Minimal system for running LLMs on consumer GPUs (research project)
Top 40.5% on sourcepulse
MiniLLM provides a minimal, Python-centric system for running large language models (LLMs) on consumer-grade NVIDIA GPUs. It targets researchers and power users seeking an accessible platform for experimentation with LLMs, focusing on efficient inference and alignment research.
How It Works
MiniLLM leverages the GPTQ algorithm for model compression, enabling significant reductions in GPU memory usage. This approach allows larger models, up to 170B parameters, to run on hardware typically found in consumer setups. The system supports multiple LLM architectures, including LLAMA, BLOOM, and OPT, with a codebase designed for simplicity and ease of use.
Quick Start & Requirements
pip install -r requirements.txt
followed by python setup.py install
. A conda environment is recommended.minillm download --model <model_name> --weights <weights_path>
.Highlighted Details
Maintenance & Community
This is a research project from Cornell Tech and Cornell University. Feedback can be sent to Volodymyr Kuleshov.
Licensing & Compatibility
The repository is licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
Currently, only NVIDIA GPUs are supported. The project is experimental and in progress, with plans to add support for more LLMs, automated quantization, and fine-tuning capabilities. Some generated outputs may require manual selection from multiple samples.
2 years ago
1 day