can-i-finetune-this by DaoyuanLi2816

Estimate and optimize LLM fine-tuning on consumer GPUs

Created 2 months ago

790 stars

Top 43.6% on SourcePulse

Project Summary

This project helps users with consumer NVIDIA GPUs determine if Hugging Face LLMs can be fine-tuned locally using LoRA/QLoRA. It prevents wasted disk space and time by estimating VRAM usage, recommending configurations, and generating runnable training scripts before full downloads.

How It Works

The tool models VRAM comprehensively, accounting for weights, trainable parameters, gradients, optimizer states (AdamW variants), and activations (sequence length, batch size, gradient checkpointing), plus a safety margin. It focuses on training feasibility, offering downsizing suggestions and runnable recipes. Optional bench and calibrate commands ground estimates in real machine measurements, bridging the gap between static predictions and actual VRAM usage.

Quick Start & Requirements

Install: Core: pip install canifinetune. Training/benchmarking: pip install canifinetune[train].
Prerequisites: Consumer NVIDIA GPU. CUDA toolkit compatible with PyTorch (e.g., uv pip install torch --index-url https://download.pytorch.org/whl/cu121). Python 3.x.
Dependencies: Training: torch, transformers, peft, bitsandbytes, trl, datasets. Reporting: pandas, tabulate.
Links: Quickstart examples in README. Troubleshooting: docs/troubleshooting.md. RTX 4080 baselines: docs/rtx4080_baselines.md.

Highlighted Details

Estimates VRAM for LoRA/QLoRA fine-tuning, including quantization, optimizers, and activations.
Provides concrete downsizing suggestions (batch size, seq length, LoRA rank, quantization) for infeasible configurations.
Generates ready-to-run Hugging Face + PEFT + TRL training scripts.
canifinetune bench/calibrate ground estimates with real machine measurements.
Includes non-synthetic RTX 4080 baseline measurements.

Maintenance & Community

No specific contributors or sponsorships are detailed. CI uses GitHub Actions. The roadmap outlines potential future directions like multi-GPU estimation and sequence-classification support, but these are uncommitted. Contributions are welcomed.

Licensing & Compatibility

License: MIT.
Compatibility: Permissive for commercial use and closed-source integration.

Limitations & Caveats

Scope is limited to single consumer GPUs, single nodes, LoRA/QLoRA, causal LMs, and the Hugging Face stack. Estimates include confidence levels due to static activation memory prediction challenges. Benchmark tables explicitly mark unmeasured configurations as "not run."

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

84 stars in the last 30 days