Discover and explore top open-source AI tools and projects—updated daily.
ZinYYEfficient continual learning for LLMs
Top 89.0% on SourcePulse
TreeLoRA offers an efficient continual learning solution for Large Language Models (LLMs) by employing layer-wise LoRA adapters organized via a hierarchical gradient-similarity tree. This approach targets researchers and engineers seeking to adapt LLMs to new tasks without catastrophic forgetting, providing a method for more efficient and structured parameter-efficient fine-tuning. The primary benefit is enabling LLMs to learn sequentially from multiple tasks while maintaining performance on previously learned ones, with an emphasis on computational efficiency.
How It Works
The core innovation lies in its hierarchical structure for managing LoRA adapters. Instead of applying LoRA uniformly or through simpler methods, TreeLoRA organizes adapters based on gradient similarity across different model layers. This creates a tree-like structure where related adaptations are grouped, allowing for more efficient selection and application of adapters during continual learning. This gradient-guided organization aims to optimize the trade-off between learning new information and preserving old knowledge, leading to improved performance and efficiency in sequential task learning.
Quick Start & Requirements
pip install -r requirements.txt../PTM/ directory, and extract the LLM-CL-Benchmark dataset into data/LLM-CL-Benchmark.scripts/ directory, with scripts/lora_based_methods/Tree_LoRA.sh demonstrating TreeLoRA training. The run_all_exps.sh script executes all experiments.Highlighted Details
Maintenance & Community
The provided README does not contain specific details regarding maintainers, community channels (e.g., Discord, Slack), project roadmap, or notable sponsorships.
Licensing & Compatibility
The README does not specify the software license. Therefore, compatibility for commercial use or closed-source linking cannot be determined from the provided information.
Limitations & Caveats
The README does not detail specific limitations, known bugs, or the project's development stage (e.g., alpha, beta). The dependency on a specific CUDA version (12.4) via DeepSpeed may present an adoption barrier for users with different GPU setups.
3 weeks ago
Inactive
tomaarsen
epfLLM
philschmid