Discover and explore top open-source AI tools and projects—updated daily.
DaoyuanLi2816Estimate and optimize LLM fine-tuning on consumer GPUs
New!
Top 68.2% on SourcePulse
This project helps users with consumer NVIDIA GPUs determine if Hugging Face LLMs can be fine-tuned locally using LoRA/QLoRA. It prevents wasted disk space and time by estimating VRAM usage, recommending configurations, and generating runnable training scripts before full downloads.
How It Works
The tool models VRAM comprehensively, accounting for weights, trainable parameters, gradients, optimizer states (AdamW variants), and activations (sequence length, batch size, gradient checkpointing), plus a safety margin. It focuses on training feasibility, offering downsizing suggestions and runnable recipes. Optional bench and calibrate commands ground estimates in real machine measurements, bridging the gap between static predictions and actual VRAM usage.
Quick Start & Requirements
pip install canifinetune. Training/benchmarking: pip install canifinetune[train].uv pip install torch --index-url https://download.pytorch.org/whl/cu121). Python 3.x.torch, transformers, peft, bitsandbytes, trl, datasets. Reporting: pandas, tabulate.docs/troubleshooting.md. RTX 4080 baselines: docs/rtx4080_baselines.md.Highlighted Details
canifinetune bench/calibrate ground estimates with real machine measurements.Maintenance & Community
No specific contributors or sponsorships are detailed. CI uses GitHub Actions. The roadmap outlines potential future directions like multi-GPU estimation and sequence-classification support, but these are uncommitted. Contributions are welcomed.
Licensing & Compatibility
Limitations & Caveats
Scope is limited to single consumer GPUs, single nodes, LoRA/QLoRA, causal LMs, and the Hugging Face stack. Estimates include confidence levels due to static activation memory prediction challenges. Benchmark tables explicitly mark unmeasured configurations as "not run."
2 days ago
Inactive
ztxz16
AlexsJones