TreeLoRA  by ZinYY

Efficient continual learning for LLMs

Created 5 months ago
298 stars

Top 89.0% on SourcePulse

GitHubView on GitHub
Project Summary

TreeLoRA offers an efficient continual learning solution for Large Language Models (LLMs) by employing layer-wise LoRA adapters organized via a hierarchical gradient-similarity tree. This approach targets researchers and engineers seeking to adapt LLMs to new tasks without catastrophic forgetting, providing a method for more efficient and structured parameter-efficient fine-tuning. The primary benefit is enabling LLMs to learn sequentially from multiple tasks while maintaining performance on previously learned ones, with an emphasis on computational efficiency.

How It Works

The core innovation lies in its hierarchical structure for managing LoRA adapters. Instead of applying LoRA uniformly or through simpler methods, TreeLoRA organizes adapters based on gradient similarity across different model layers. This creates a tree-like structure where related adaptations are grouped, allowing for more efficient selection and application of adapters during continual learning. This gradient-guided organization aims to optimize the trade-off between learning new information and preserving old knowledge, leading to improved performance and efficiency in sequential task learning.

Quick Start & Requirements

  • Installation: Install dependencies via pip install -r requirements.txt.
  • Prerequisites: Requires PyTorch 2.4.1, Torchvision 0.19.1, Accelerate 1.0.1, Bitsandbytes 0.46.1, and DeepSpeed 0.15.3+cu124torch2.4. A CUDA-enabled GPU with version 12.4 is strongly implied by the DeepSpeed dependency. Users must download pre-trained LLM models (e.g., Llama-3.2-1B-Instruct) and place them in the ./PTM/ directory, and extract the LLM-CL-Benchmark dataset into data/LLM-CL-Benchmark.
  • Running Experiments: Training and evaluation scripts are available in the scripts/ directory, with scripts/lora_based_methods/Tree_LoRA.sh demonstrating TreeLoRA training. The run_all_exps.sh script executes all experiments.

Highlighted Details

  • Efficient continual learning through layer-wise LoRA adapters.
  • Hierarchical gradient-similarity tree for organizing adapters.
  • Support for multiple LLM architectures including Gemma, LLaMA, and Mistral.
  • DeepSpeed integration for enhanced training efficiency.
  • Includes a Flash attention implementation for performance gains.

Maintenance & Community

The provided README does not contain specific details regarding maintainers, community channels (e.g., Discord, Slack), project roadmap, or notable sponsorships.

Licensing & Compatibility

The README does not specify the software license. Therefore, compatibility for commercial use or closed-source linking cannot be determined from the provided information.

Limitations & Caveats

The README does not detail specific limitations, known bugs, or the project's development stage (e.g., alpha, beta). The dependency on a specific CUDA version (12.4) via DeepSpeed may present an adoption barrier for users with different GPU setups.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
281 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.