Samba by microsoft

Language model research paper for efficient unlimited context

Created 1 year ago

938 stars

Top 39.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

Project Summary

Samba is a novel language model architecture designed for efficient, unlimited context length language modeling. It targets researchers and practitioners seeking to improve performance on long-context tasks by combining the strengths of state space models (Mamba) with attention mechanisms. The primary benefit is achieving linear complexity with respect to sequence length while maintaining strong performance on standard benchmarks and long-context retrieval.

How It Works

Samba employs a hybrid architecture that integrates Mamba blocks with sliding window attention and MLP layers. This combination aims to leverage Mamba's efficient, linear-time processing for long sequences while incorporating attention's ability to capture global dependencies. The specific arrangement of Mamba, MLP, and sliding window attention at the layer level is key to its performance and efficiency.

Quick Start & Requirements

Install: Clone the repository and follow the Dockerfile for environment setup.
Prerequisites: Python, PyTorch, lm-evaluation-harness. Requires significant disk space (893GB) for the SlimPajama dataset. GPU acceleration is essential for training and evaluation.
Setup: Data preparation involves cloning the SlimPajama dataset and running provided scripts. Training requires a distributed setup (e.g., torchrun with multiple GPUs).
Links: SlimPajama Dataset

Highlighted Details

Samba-3.8B-instruct outperforms Phi3-mini on MMLU, GSM8K, and HumanEval benchmarks.
Achieves perfect long-context retrieval ability with minimal instruction tuning.
Maintains linear complexity with respect to sequence length.
Training infrastructure is based on modified TinyLlama and LitGPT.

Maintenance & Community

Accepted to ICLR 2025.
Codebase released for 421M and 1.3B models.
Contact: Liliang Ren (liliangren@microsoft.com).

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

The evaluation currently only supports non-generation based tasks.
The largest model (Samba-3.8B) requires substantial computational resources for training and potentially for inference.
The README mentions a "preview" status for the Samba-3.8B-instruct model.

Health Check

Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days