matmulfreellm  by ridgerchu

MatMul-free language models

Created 1 year ago
3,032 stars

Top 15.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository implements MatMul-Free LM, a novel language model architecture that replaces traditional matrix multiplication with more efficient operations. Targeting researchers and developers seeking to optimize LLM inference and training, it offers compatibility with the Hugging Face Transformers library and provides pre-trained models up to 2.7B parameters.

How It Works

The core innovation lies in replacing dense matrix multiplications with a custom architecture, HGRNBit, which leverages fused operations and potentially ternary weights. This approach aims to reduce computational complexity and memory bandwidth requirements, leading to more efficient model scaling and inference. The architecture includes specialized projection layers (FusedBitLinear) and activation functions (SiLU) within its attention and MLP blocks.

Quick Start & Requirements

  • Install via pip: pip install -U git+https://github.com/ridgerchu/matmulfreellm
  • Requirements: PyTorch >= 2.0, Triton >= 2.2, einops.
  • Pre-trained models are available on Hugging Face: 370M, 1.3B, 2.7B.
  • Usage examples for model initialization and text generation are provided in the README.

Highlighted Details

  • Implements MatMul-Free LM architecture compatible with Hugging Face Transformers.
  • Offers pre-trained models ranging from 370M to 2.7B parameters.
  • Scaling law analysis suggests steeper performance descent compared to Transformer++, indicating higher efficiency.
  • Utilizes custom fused operations and potentially ternary weights for optimization.

Maintenance & Community

  • The project is associated with an arXiv preprint: 2406.02528.
  • Primary contributor appears to be ridgerchu.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README.

Limitations & Caveats

The project is presented as an implementation of research findings, and its stability, long-term maintenance, and production readiness are not yet established. The absence of a specified license may pose compatibility issues for commercial or closed-source applications.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
3 more.

prompt-lookup-decoding by apoorvumang

0.2%
566
Decoding method for faster LLM generation
Created 1 year ago
Updated 1 year ago
Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
4 more.

fastformers by microsoft

0%
707
NLU optimization recipes for transformer models
Created 5 years ago
Updated 6 months ago
Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
Created 4 years ago
Updated 8 months ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.