Samba  by microsoft

Language model research paper for efficient unlimited context

created 1 year ago
900 stars

Top 41.2% on sourcepulse

GitHubView on GitHub
Project Summary

Samba is a novel language model architecture designed for efficient, unlimited context length language modeling. It targets researchers and practitioners seeking to improve performance on long-context tasks by combining the strengths of state space models (Mamba) with attention mechanisms. The primary benefit is achieving linear complexity with respect to sequence length while maintaining strong performance on standard benchmarks and long-context retrieval.

How It Works

Samba employs a hybrid architecture that integrates Mamba blocks with sliding window attention and MLP layers. This combination aims to leverage Mamba's efficient, linear-time processing for long sequences while incorporating attention's ability to capture global dependencies. The specific arrangement of Mamba, MLP, and sliding window attention at the layer level is key to its performance and efficiency.

Quick Start & Requirements

  • Install: Clone the repository and follow the Dockerfile for environment setup.
  • Prerequisites: Python, PyTorch, lm-evaluation-harness. Requires significant disk space (893GB) for the SlimPajama dataset. GPU acceleration is essential for training and evaluation.
  • Setup: Data preparation involves cloning the SlimPajama dataset and running provided scripts. Training requires a distributed setup (e.g., torchrun with multiple GPUs).
  • Links: SlimPajama Dataset

Highlighted Details

  • Samba-3.8B-instruct outperforms Phi3-mini on MMLU, GSM8K, and HumanEval benchmarks.
  • Achieves perfect long-context retrieval ability with minimal instruction tuning.
  • Maintains linear complexity with respect to sequence length.
  • Training infrastructure is based on modified TinyLlama and LitGPT.

Maintenance & Community

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

  • The evaluation currently only supports non-generation based tasks.
  • The largest model (Samba-3.8B) requires substantial computational resources for training and potentially for inference.
  • The README mentions a "preview" status for the Samba-3.8B-instruct model.
Health Check
Last commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
37 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.0%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
10 more.

TinyLlama by jzhang38

0.3%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Feedback? Help us improve.