TreeLoRA by ZinYY

Efficient continual learning for LLMs

Created 7 months ago

343 stars

Top 80.6% on SourcePulse

Project Summary

TreeLoRA offers an efficient continual learning solution for Large Language Models (LLMs) by employing layer-wise LoRA adapters organized via a hierarchical gradient-similarity tree. This approach targets researchers and engineers seeking to adapt LLMs to new tasks without catastrophic forgetting, providing a method for more efficient and structured parameter-efficient fine-tuning. The primary benefit is enabling LLMs to learn sequentially from multiple tasks while maintaining performance on previously learned ones, with an emphasis on computational efficiency.

How It Works

The core innovation lies in its hierarchical structure for managing LoRA adapters. Instead of applying LoRA uniformly or through simpler methods, TreeLoRA organizes adapters based on gradient similarity across different model layers. This creates a tree-like structure where related adaptations are grouped, allowing for more efficient selection and application of adapters during continual learning. This gradient-guided organization aims to optimize the trade-off between learning new information and preserving old knowledge, leading to improved performance and efficiency in sequential task learning.

Quick Start & Requirements

Installation: Install dependencies via pip install -r requirements.txt.
Prerequisites: Requires PyTorch 2.4.1, Torchvision 0.19.1, Accelerate 1.0.1, Bitsandbytes 0.46.1, and DeepSpeed 0.15.3+cu124torch2.4. A CUDA-enabled GPU with version 12.4 is strongly implied by the DeepSpeed dependency. Users must download pre-trained LLM models (e.g., Llama-3.2-1B-Instruct) and place them in the ./PTM/ directory, and extract the LLM-CL-Benchmark dataset into data/LLM-CL-Benchmark.
Running Experiments: Training and evaluation scripts are available in the scripts/ directory, with scripts/lora_based_methods/Tree_LoRA.sh demonstrating TreeLoRA training. The run_all_exps.sh script executes all experiments.

Highlighted Details

Efficient continual learning through layer-wise LoRA adapters.
Hierarchical gradient-similarity tree for organizing adapters.
Support for multiple LLM architectures including Gemma, LLaMA, and Mistral.
DeepSpeed integration for enhanced training efficiency.
Includes a Flash attention implementation for performance gains.

Maintenance & Community

The provided README does not contain specific details regarding maintainers, community channels (e.g., Discord, Slack), project roadmap, or notable sponsorships.

Licensing & Compatibility

The README does not specify the software license. Therefore, compatibility for commercial use or closed-source linking cannot be determined from the provided information.

Limitations & Caveats

The README does not detail specific limitations, known bugs, or the project's development stage (e.g., alpha, beta). The dependency on a specific CUDA version (12.4) via DeepSpeed may present an adoption barrier for users with different GPU setups.

TreeLoRA by ZinYY

Explore Similar Projects

APOLLO by zhuhanqing

awesome-lifelong-llm-agent by qianlima-lab

llm-continual-learning-survey by Wang-ML-Lab

MemoryLLM by wangyu-ustc

Awesome-LLM-System-Papers by AmadeusChan

tiny-grpo by open-thought

attention_sinks by tomaarsen

lightron by lwj2015

megalodon by XuezheMax

Megatron-LLM by epfLLM

mini_qwen by qiufengqijun

deep-learning-pytorch-huggingface by philschmid