megalodon  by XuezheMax

Reference implementation for Megalodon 7B model

created 1 year ago
524 stars

Top 61.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides the reference implementation for the Megalodon 7B model, an efficient Large Language Model (LLM) designed for pretraining and inference with unlimited context length. It is targeted at researchers and engineers working with LLMs who require advanced context handling capabilities.

How It Works

Megalodon leverages a novel "Moving Average Equipped Gated Attention" (MEGA) mechanism, as detailed in their 2023 ICLR paper. This approach aims to improve efficiency and enable handling of significantly longer contexts compared to traditional attention mechanisms, without the quadratic complexity. The implementation supports distributed training and inference, including model and data parallelism.

Quick Start & Requirements

  • Installation: Requires PyTorch 2.0.1 with CUDA 11.7. Installation involves cloning and compiling the apex and fairscale libraries, followed by installing the megalodon package itself.
  • Prerequisites: PyTorch 2.0.1, CUDA 11.7, apex (specific version 23.08), fairscale (specific branch ngoyal_bf16_changes).
  • Evaluation: Launch evaluation using torchrun with specified checkpoint, tokenizer, and data paths.
  • Pretraining: A Python pseudo-code example demonstrates the structure for launching pretraining jobs, involving distributed initialization and model building.
  • Documentation: https://github.com/XuezheMax/megalodon

Highlighted Details

  • Reference implementation for Megalodon 7B.
  • Supports unlimited context length via MEGA attention.
  • Includes code for both LLM pretraining and evaluation.
  • Designed for distributed training and inference.

Maintenance & Community

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README. The underlying libraries (apex, fairscale) have their own licenses. Users should verify licensing for commercial use.

Limitations & Caveats

The installation process requires specific versions of PyTorch, CUDA, and custom builds of apex and fairscale, which may introduce complexity and potential compatibility issues with other environments. The README focuses on core functionality and does not detail specific hardware requirements beyond CUDA.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
1 more.

recurrent-pretraining by seal-rg

0.1%
806
Pretraining code for depth-recurrent language model research
created 5 months ago
updated 2 weeks ago
Feedback? Help us improve.