mistral  by stanford-crfm

Framework for large language model training

created 4 years ago
575 stars

Top 56.9% on sourcepulse

GitHubView on GitHub
Project Summary

Mistral is a framework for transparent and accessible large-scale language model training, built with Hugging Face Transformers. It provides tools and scripts for incorporating new datasets, distributed training schemes (including cloud providers), and evaluation. The framework is designed for researchers and engineers working with large language models.

How It Works

Mistral leverages Hugging Face Transformers for its core functionality, integrating DeepSpeed for efficient distributed training. It offers a structured approach to managing training configurations, data loading, and checkpointing, facilitating reproducible and scalable LLM development. The framework's design emphasizes transparency and accessibility, making it easier to understand and adapt for various training scenarios.

Quick Start & Requirements

  • Installation: Requires Python 3.8.12, PyTorch 1.11.0 (compiled with CUDA 11.3), CUDA 11.3, NCCL 2.10, Transformers 4.17.0, and DeepSpeed 0.6.0. Environment setup is recommended via Conda: conda create -n mistral python=3.8.12 pytorch=1.11.0 torchdata cudatoolkit=11.3 -c pytorch followed by conda activate mistral and pip install -r setup/pip-requirements.txt. An environment export is available at environments/environment-gpu.yaml.
  • Documentation: Full documentation is available on Read the Docs.
  • Demo: A Google Colab notebook is provided for running a demo.

Highlighted Details

  • Supports single-node, single-GPU, and multi-node, multi-GPU training with DeepSpeed.
  • Models are stored in Hugging Face format, allowing direct use with 🤗 Transformers.
  • Provides pre-trained GPT-2 Small and Medium models on OpenWebText, with checkpoints available on Hugging Face Hub.
  • Detailed checkpointing schedule and access methods via git-lfs are documented.

Maintenance & Community

  • Issues, questions, and feature requests should be directed to the GitHub Issue Tracker.
  • Information on contributing is available.

Licensing & Compatibility

  • The README does not explicitly state a license. However, the project is built with Hugging Face Transformers, which is typically Apache 2.0 licensed. Compatibility for commercial use or closed-source linking would require explicit license confirmation.

Limitations & Caveats

The project's specific license is not clearly stated in the README, which could impact commercial adoption. The setup requires a specific set of older dependency versions (e.g., PyTorch 1.11.0, CUDA 11.3), which may pose challenges for users with newer hardware or existing environments.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.