can-i-finetune-this  by DaoyuanLi2816

Estimate and optimize LLM fine-tuning on consumer GPUs

Created 3 weeks ago

New!

432 stars

Top 68.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project helps users with consumer NVIDIA GPUs determine if Hugging Face LLMs can be fine-tuned locally using LoRA/QLoRA. It prevents wasted disk space and time by estimating VRAM usage, recommending configurations, and generating runnable training scripts before full downloads.

How It Works

The tool models VRAM comprehensively, accounting for weights, trainable parameters, gradients, optimizer states (AdamW variants), and activations (sequence length, batch size, gradient checkpointing), plus a safety margin. It focuses on training feasibility, offering downsizing suggestions and runnable recipes. Optional bench and calibrate commands ground estimates in real machine measurements, bridging the gap between static predictions and actual VRAM usage.

Quick Start & Requirements

  • Install: Core: pip install canifinetune. Training/benchmarking: pip install canifinetune[train].
  • Prerequisites: Consumer NVIDIA GPU. CUDA toolkit compatible with PyTorch (e.g., uv pip install torch --index-url https://download.pytorch.org/whl/cu121). Python 3.x.
  • Dependencies: Training: torch, transformers, peft, bitsandbytes, trl, datasets. Reporting: pandas, tabulate.
  • Links: Quickstart examples in README. Troubleshooting: docs/troubleshooting.md. RTX 4080 baselines: docs/rtx4080_baselines.md.

Highlighted Details

  • Estimates VRAM for LoRA/QLoRA fine-tuning, including quantization, optimizers, and activations.
  • Provides concrete downsizing suggestions (batch size, seq length, LoRA rank, quantization) for infeasible configurations.
  • Generates ready-to-run Hugging Face + PEFT + TRL training scripts.
  • canifinetune bench/calibrate ground estimates with real machine measurements.
  • Includes non-synthetic RTX 4080 baseline measurements.

Maintenance & Community

No specific contributors or sponsorships are detailed. CI uses GitHub Actions. The roadmap outlines potential future directions like multi-GPU estimation and sequence-classification support, but these are uncommitted. Contributions are welcomed.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: Permissive for commercial use and closed-source integration.

Limitations & Caveats

Scope is limited to single consumer GPUs, single nodes, LoRA/QLoRA, causal LMs, and the Hugging Face stack. Estimates include confidence levels due to static activation memory prediction challenges. Benchmark tables explicitly mark unmeasured configurations as "not run."

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
0
Star History
432 stars in the last 27 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.3%
5k
High-performance C++ LLM inference library
Created 3 years ago
Updated 14 hours ago
Feedback? Help us improve.