Discover and explore top open-source AI tools and projects—updated daily.
galilai-groupLLM fine-tuning and pretraining framework
Top 99.4% on SourcePulse
Summary This repository offers tools for training and fine-tuning Large Language Models (LLMs) using the Joint Embedding Predictive Architecture (JEPA). It targets researchers and engineers aiming to optimize LLM training efficiency through novel techniques like additive attention masks and JEPA-loss dropout, reducing computational costs and improving performance for faster cycles.
How It Works
The core innovation is in finetune.py and pretraining scripts. The --additive_mask feature consolidates text and code encoding into a single forward pass, reducing redundant computations. JEPA-loss dropout (--jepa_ratio) enables significant compute savings (e.g., 1.25X compute for 0.75 dropout rate) by selectively skipping JEPA-loss calculations. The project also supports Semantic Tube Prediction (STP) via stp.py for specialized fine-tuning.
Quick Start & Requirements
Setup requires manually interpreting setup.sh commands, not direct execution. Users must select environment-specific configurations. Datasets are required, including spider (unzip spider_data.zip). Large memory is needed for models up to 8B parameters. Training 8B+ models is supported on NVIDIA H200 GPUs via finetune8bh200.py and run8bh200.sh. No direct quick-start or demo links are provided.
Highlighted Details
--additive_mask enables a single forward pass for text/code encoding.--jepa_ratio) offers substantial computational savings.--lora and --lora_rank.Maintenance & Community The README provides no information on contributors, sponsorships, community channels (e.g., Discord, Slack), or roadmaps.
Licensing & Compatibility The README omits software license details, preventing assessment of commercial use or integration compatibility.
Limitations & Caveats
Setup demands manual script interpretation. The --additive_mask feature may have compatibility issues with left-padding tokenizers. The absence of explicit licensing is a significant barrier for adoption decisions, especially for commercial applications.
6 days ago
Inactive
zengyan-97
InternLM
seal-rg
microsoft