megalodon by XuezheMax

Reference implementation for Megalodon 7B model

Created 1 year ago

528 stars

Top 59.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This repository provides the reference implementation for the Megalodon 7B model, an efficient Large Language Model (LLM) designed for pretraining and inference with unlimited context length. It is targeted at researchers and engineers working with LLMs who require advanced context handling capabilities.

How It Works

Megalodon leverages a novel "Moving Average Equipped Gated Attention" (MEGA) mechanism, as detailed in their 2023 ICLR paper. This approach aims to improve efficiency and enable handling of significantly longer contexts compared to traditional attention mechanisms, without the quadratic complexity. The implementation supports distributed training and inference, including model and data parallelism.

Quick Start & Requirements

Installation: Requires PyTorch 2.0.1 with CUDA 11.7. Installation involves cloning and compiling the apex and fairscale libraries, followed by installing the megalodon package itself.
Prerequisites: PyTorch 2.0.1, CUDA 11.7, apex (specific version 23.08), fairscale (specific branch ngoyal_bf16_changes).
Evaluation: Launch evaluation using torchrun with specified checkpoint, tokenizer, and data paths.
Pretraining: A Python pseudo-code example demonstrates the structure for launching pretraining jobs, involving distributed initialization and model building.
Documentation: https://github.com/XuezheMax/megalodon

Highlighted Details

Reference implementation for Megalodon 7B.
Supports unlimited context length via MEGA attention.
Includes code for both LLM pretraining and evaluation.
Designed for distributed training and inference.

Maintenance & Community

Publicly released April 15th, 2024.
Discord community available: https://discord.gg/Unf8Fa7kWt

Licensing & Compatibility

The repository itself does not explicitly state a license in the README. The underlying libraries (apex, fairscale) have their own licenses. Users should verify licensing for commercial use.

Limitations & Caveats

The installation process requires specific versions of PyTorch, CUDA, and custom builds of apex and fairscale, which may introduce complexity and potential compatibility issues with other environments. The README focuses on core functionality and does not detail specific hardware requirements beyond CUDA.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days