llm-jepa  by galilai-group

LLM fine-tuning and pretraining framework

Created 7 months ago
253 stars

Top 99.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary This repository offers tools for training and fine-tuning Large Language Models (LLMs) using the Joint Embedding Predictive Architecture (JEPA). It targets researchers and engineers aiming to optimize LLM training efficiency through novel techniques like additive attention masks and JEPA-loss dropout, reducing computational costs and improving performance for faster cycles.

How It Works The core innovation is in finetune.py and pretraining scripts. The --additive_mask feature consolidates text and code encoding into a single forward pass, reducing redundant computations. JEPA-loss dropout (--jepa_ratio) enables significant compute savings (e.g., 1.25X compute for 0.75 dropout rate) by selectively skipping JEPA-loss calculations. The project also supports Semantic Tube Prediction (STP) via stp.py for specialized fine-tuning.

Quick Start & Requirements Setup requires manually interpreting setup.sh commands, not direct execution. Users must select environment-specific configurations. Datasets are required, including spider (unzip spider_data.zip). Large memory is needed for models up to 8B parameters. Training 8B+ models is supported on NVIDIA H200 GPUs via finetune8bh200.py and run8bh200.sh. No direct quick-start or demo links are provided.

Highlighted Details

  • Efficient Encoding: --additive_mask enables a single forward pass for text/code encoding.
  • Compute Reduction: JEPA-loss dropout (--jepa_ratio) offers substantial computational savings.
  • Scalability: Scripts support training models up to 8B parameters on NVIDIA H200 GPUs.
  • Parameter-Efficient Fine-tuning: Supports LoRA fine-tuning via --lora and --lora_rank.
  • Specialized Tasks: Includes functionality for Semantic Tube Prediction (STP) and JEPA-loss ablations.

Maintenance & Community The README provides no information on contributors, sponsorships, community channels (e.g., Discord, Slack), or roadmaps.

Licensing & Compatibility The README omits software license details, preventing assessment of commercial use or integration compatibility.

Limitations & Caveats Setup demands manual script interpretation. The --additive_mask feature may have compatibility issues with left-padding tokenizers. The absence of explicit licensing is a significant barrier for adoption decisions, especially for commercial applications.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
27 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

InternEvo by InternLM

0%
420
Lightweight training framework for model pre-training
Created 2 years ago
Updated 8 months ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

recurrent-pretraining by seal-rg

0%
872
Pretraining code for depth-recurrent language model research
Created 1 year ago
Updated 3 months ago
Starred by Lukas Biewald Lukas Biewald(Cofounder of Weights & Biases), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

DialoGPT by microsoft

0%
2k
Response generation model via large-scale pretraining
Created 6 years ago
Updated 3 years ago
Feedback? Help us improve.