Moxin-LLM by moxin-org

Open-source LLM family for research, reproducibility, and transparency

Created 1 year ago

621 stars

Top 53.2% on SourcePulse

Project Summary

Moxin-LLM provides a family of fully open-source and reproducible Large Language Models (LLMs) designed to address concerns around transparency and "openwashing" in generative AI. Targeting researchers and developers, it offers base, chat, instruct, and reasoning models, all adhering to the Model Openness Framework (MOF) for completeness and openness.

How It Works

Moxin-LLM is built upon the Llama architecture and trained using the ColossalAI framework for efficient distributed training. The models are further refined using the Tulu 3 dataset for supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to create chat and instruct variants. A reasoning model is developed using DeepScaleR and GRPO reinforcement learning techniques, leveraging datasets like OpenThoughts and OpenR1-Math-220k.

Quick Start & Requirements

Inference: Use Hugging Face transformers library.

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
model_name = 'moxin-org/moxin-7b'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, device_map="auto")

Prerequisites: PyTorch (v2.0.0 recommended), CUDA (v11.7 recommended), Flash Attention (v2.2.1).
GGUF Conversion: A convert_hf_to_gguf.py script is provided.
Training: Requires datasets, PyTorch, CUDA, Flash Attention, and ColossalAI. Training scripts are available in scripts/.
Documentation: Technical Report available.

Highlighted Details

Models are evaluated using lm-evaluation-harness and OLMES, showing competitive performance against models like Mistral-7B and Llama 3.1-8B.
The project emphasizes reproducibility by releasing datasets, training scripts, and trained models.
A dedicated reasoning model is fine-tuned using GRPO for improved Chain-of-Thought capabilities.
The base model is an enhanced version of hpcai-tech/Colossal-LLaMA-2-7b-base.

Maintenance & Community

The project is associated with the moxin-org organization.
No specific community links (Discord, Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The models and code are described as "fully open-source." Specific license details are not explicitly stated in the README, but the project aims to combat "openwashing" by adhering to open science principles.

Limitations & Caveats

The README mentions CUDA 11.7 and PyTorch 2.0.0 as tested versions, implying potential compatibility issues with other versions.
Flash Attention installation requires manual compilation.
Specific performance claims are benchmarked, but real-world performance may vary.

Health Check

Last Commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days