Moxin-LLM  by moxin-org

Open-source LLM family for research, reproducibility, and transparency

created 8 months ago
602 stars

Top 55.1% on sourcepulse

GitHubView on GitHub
Project Summary

Moxin-LLM provides a family of fully open-source and reproducible Large Language Models (LLMs) designed to address concerns around transparency and "openwashing" in generative AI. Targeting researchers and developers, it offers base, chat, instruct, and reasoning models, all adhering to the Model Openness Framework (MOF) for completeness and openness.

How It Works

Moxin-LLM is built upon the Llama architecture and trained using the ColossalAI framework for efficient distributed training. The models are further refined using the Tulu 3 dataset for supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to create chat and instruct variants. A reasoning model is developed using DeepScaleR and GRPO reinforcement learning techniques, leveraging datasets like OpenThoughts and OpenR1-Math-220k.

Quick Start & Requirements

  • Inference: Use Hugging Face transformers library.
    from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
    model_name = 'moxin-org/moxin-7b'
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, device_map="auto")
    
  • Prerequisites: PyTorch (v2.0.0 recommended), CUDA (v11.7 recommended), Flash Attention (v2.2.1).
  • GGUF Conversion: A convert_hf_to_gguf.py script is provided.
  • Training: Requires datasets, PyTorch, CUDA, Flash Attention, and ColossalAI. Training scripts are available in scripts/.
  • Documentation: Technical Report available.

Highlighted Details

  • Models are evaluated using lm-evaluation-harness and OLMES, showing competitive performance against models like Mistral-7B and Llama 3.1-8B.
  • The project emphasizes reproducibility by releasing datasets, training scripts, and trained models.
  • A dedicated reasoning model is fine-tuned using GRPO for improved Chain-of-Thought capabilities.
  • The base model is an enhanced version of hpcai-tech/Colossal-LLaMA-2-7b-base.

Maintenance & Community

  • The project is associated with the moxin-org organization.
  • No specific community links (Discord, Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

  • The models and code are described as "fully open-source." Specific license details are not explicitly stated in the README, but the project aims to combat "openwashing" by adhering to open science principles.

Limitations & Caveats

  • The README mentions CUDA 11.7 and PyTorch 2.0.0 as tested versions, implying potential compatibility issues with other versions.
  • Flash Attention installation requires manual compilation.
  • Specific performance claims are benchmarked, but real-world performance may vary.
Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
515 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.