mistral by stanford-crfm

Framework for large language model training

Created 4 years ago

576 stars

Top 56.1% on SourcePulse

View on GitHub

11 Experts Love This Project

Clement Delangue

Cofounder of Hugging Face

Omar Sanseviero

DevRel at Google DeepMind

and 7 more!

Project Summary

Mistral is a framework for transparent and accessible large-scale language model training, built with Hugging Face Transformers. It provides tools and scripts for incorporating new datasets, distributed training schemes (including cloud providers), and evaluation. The framework is designed for researchers and engineers working with large language models.

How It Works

Mistral leverages Hugging Face Transformers for its core functionality, integrating DeepSpeed for efficient distributed training. It offers a structured approach to managing training configurations, data loading, and checkpointing, facilitating reproducible and scalable LLM development. The framework's design emphasizes transparency and accessibility, making it easier to understand and adapt for various training scenarios.

Quick Start & Requirements

Installation: Requires Python 3.8.12, PyTorch 1.11.0 (compiled with CUDA 11.3), CUDA 11.3, NCCL 2.10, Transformers 4.17.0, and DeepSpeed 0.6.0. Environment setup is recommended via Conda: conda create -n mistral python=3.8.12 pytorch=1.11.0 torchdata cudatoolkit=11.3 -c pytorch followed by conda activate mistral and pip install -r setup/pip-requirements.txt. An environment export is available at environments/environment-gpu.yaml.
Documentation: Full documentation is available on Read the Docs.
Demo: A Google Colab notebook is provided for running a demo.

Highlighted Details

Supports single-node, single-GPU, and multi-node, multi-GPU training with DeepSpeed.
Models are stored in Hugging Face format, allowing direct use with 🤗 Transformers.
Provides pre-trained GPT-2 Small and Medium models on OpenWebText, with checkpoints available on Hugging Face Hub.
Detailed checkpointing schedule and access methods via git-lfs are documented.

Maintenance & Community

Issues, questions, and feature requests should be directed to the GitHub Issue Tracker.
Information on contributing is available.

Licensing & Compatibility

The README does not explicitly state a license. However, the project is built with Hugging Face Transformers, which is typically Apache 2.0 licensed. Compatibility for commercial use or closed-source linking would require explicit license confirmation.

Limitations & Caveats

The project's specific license is not clearly stated in the README, which could impact commercial adoption. The setup requires a specific set of older dependency versions (e.g., PyTorch 1.11.0, CUDA 11.3), which may pose challenges for users with newer hardware or existing environments.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days