Reference implementation for Megalodon 7B model
Top 61.1% on sourcepulse
This repository provides the reference implementation for the Megalodon 7B model, an efficient Large Language Model (LLM) designed for pretraining and inference with unlimited context length. It is targeted at researchers and engineers working with LLMs who require advanced context handling capabilities.
How It Works
Megalodon leverages a novel "Moving Average Equipped Gated Attention" (MEGA) mechanism, as detailed in their 2023 ICLR paper. This approach aims to improve efficiency and enable handling of significantly longer contexts compared to traditional attention mechanisms, without the quadratic complexity. The implementation supports distributed training and inference, including model and data parallelism.
Quick Start & Requirements
apex
and fairscale
libraries, followed by installing the megalodon
package itself.apex
(specific version 23.08), fairscale
(specific branch ngoyal_bf16_changes
).torchrun
with specified checkpoint, tokenizer, and data paths.Highlighted Details
Maintenance & Community
Licensing & Compatibility
apex
, fairscale
) have their own licenses. Users should verify licensing for commercial use.Limitations & Caveats
The installation process requires specific versions of PyTorch, CUDA, and custom builds of apex
and fairscale
, which may introduce complexity and potential compatibility issues with other environments. The README focuses on core functionality and does not detail specific hardware requirements beyond CUDA.
2 months ago
1 day