matmulfreellm by ridgerchu

MatMul-free language models

Created 1 year ago

3,046 stars

Top 15.6% on SourcePulse

View on GitHub

7 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

Wing Lian

Founder of Axolotl AI

Carlos E. Jimenez

Coauthor of SWE-bench, SWE-agent

and 3 more!

Project Summary

This repository implements MatMul-Free LM, a novel language model architecture that replaces traditional matrix multiplication with more efficient operations. Targeting researchers and developers seeking to optimize LLM inference and training, it offers compatibility with the Hugging Face Transformers library and provides pre-trained models up to 2.7B parameters.

How It Works

The core innovation lies in replacing dense matrix multiplications with a custom architecture, HGRNBit, which leverages fused operations and potentially ternary weights. This approach aims to reduce computational complexity and memory bandwidth requirements, leading to more efficient model scaling and inference. The architecture includes specialized projection layers (FusedBitLinear) and activation functions (SiLU) within its attention and MLP blocks.

Quick Start & Requirements

Install via pip: pip install -U git+https://github.com/ridgerchu/matmulfreellm
Requirements: PyTorch >= 2.0, Triton >= 2.2, einops.
Pre-trained models are available on Hugging Face: 370M, 1.3B, 2.7B.
Usage examples for model initialization and text generation are provided in the README.

Highlighted Details

Implements MatMul-Free LM architecture compatible with Hugging Face Transformers.
Offers pre-trained models ranging from 370M to 2.7B parameters.
Scaling law analysis suggests steeper performance descent compared to Transformer++, indicating higher efficiency.
Utilizes custom fused operations and potentially ternary weights for optimization.

Maintenance & Community

The project is associated with an arXiv preprint: 2406.02528.
Primary contributor appears to be ridgerchu.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The project is presented as an implementation of research findings, and its stability, long-term maintenance, and production readiness are not yet established. The absence of a specified license may pose compatibility issues for commercial or closed-source applications.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days