math-lm by EleutherAI

Open language model for mathematics research paper

Created 2 years ago

1,096 stars

Top 34.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Llemma is an open-source language model specifically designed for mathematical tasks, targeting researchers and developers in AI and mathematics. It offers specialized capabilities for understanding and generating mathematical content, potentially accelerating research and development in areas requiring advanced mathematical reasoning.

How It Works

Llemma is built upon the GPT-NeoX architecture, a transformer-based language model. The project focuses on training these models on curated mathematical datasets, including Proof-Pile-2 and AlgebraicStack, to imbue them with strong mathematical reasoning abilities. This approach leverages large-scale data and a robust architecture to achieve specialized performance in mathematical domains.

Quick Start & Requirements

To use the models, clone the repository with git clone --recurse-submodules or run git submodule update --init --recursive after cloning. Access to the Llemma 7b and 34b models is available via Hugging Face Hub links. Further details on data preprocessing, fine-tuning, and evaluation scripts are provided within the repository.

Highlighted Details

Offers pre-trained models: Llemma 7b and Llemma 34b.
Includes datasets: Proof-Pile-2 and AlgebraicStack.
Provides code for fine-tuning and evaluation experiments.
Integrates with EleutherAI's LM Evaluation Harness for benchmarking.

Maintenance & Community

This project is from EleutherAI, a prominent research collective focused on open-source AI. Further community engagement and project updates can typically be found through EleutherAI's official channels.

Licensing & Compatibility

The specific license for the Llemma models and associated code is not explicitly stated in the provided README. However, EleutherAI projects often utilize permissive licenses like Apache 2.0 or MIT, but this should be verified for commercial use or closed-source integration.

Limitations & Caveats

The README does not detail specific performance benchmarks or limitations of the Llemma models. The project is presented as a repository for data and training code, with the models themselves hosted separately on Hugging Face Hub.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days