Samba  by microsoft

Language model research paper for efficient unlimited context

Created 1 year ago
909 stars

Top 40.1% on SourcePulse

GitHubView on GitHub
Project Summary

Samba is a novel language model architecture designed for efficient, unlimited context length language modeling. It targets researchers and practitioners seeking to improve performance on long-context tasks by combining the strengths of state space models (Mamba) with attention mechanisms. The primary benefit is achieving linear complexity with respect to sequence length while maintaining strong performance on standard benchmarks and long-context retrieval.

How It Works

Samba employs a hybrid architecture that integrates Mamba blocks with sliding window attention and MLP layers. This combination aims to leverage Mamba's efficient, linear-time processing for long sequences while incorporating attention's ability to capture global dependencies. The specific arrangement of Mamba, MLP, and sliding window attention at the layer level is key to its performance and efficiency.

Quick Start & Requirements

  • Install: Clone the repository and follow the Dockerfile for environment setup.
  • Prerequisites: Python, PyTorch, lm-evaluation-harness. Requires significant disk space (893GB) for the SlimPajama dataset. GPU acceleration is essential for training and evaluation.
  • Setup: Data preparation involves cloning the SlimPajama dataset and running provided scripts. Training requires a distributed setup (e.g., torchrun with multiple GPUs).
  • Links: SlimPajama Dataset

Highlighted Details

  • Samba-3.8B-instruct outperforms Phi3-mini on MMLU, GSM8K, and HumanEval benchmarks.
  • Achieves perfect long-context retrieval ability with minimal instruction tuning.
  • Maintains linear complexity with respect to sequence length.
  • Training infrastructure is based on modified TinyLlama and LitGPT.

Maintenance & Community

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

  • The evaluation currently only supports non-generation based tasks.
  • The largest model (Samba-3.8B) requires substantial computational resources for training and potentially for inference.
  • The README mentions a "preview" status for the Samba-3.8B-instruct model.
Health Check
Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
4 more.

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.