Open-source LLM family for research, reproducibility, and transparency
Top 55.1% on sourcepulse
Moxin-LLM provides a family of fully open-source and reproducible Large Language Models (LLMs) designed to address concerns around transparency and "openwashing" in generative AI. Targeting researchers and developers, it offers base, chat, instruct, and reasoning models, all adhering to the Model Openness Framework (MOF) for completeness and openness.
How It Works
Moxin-LLM is built upon the Llama architecture and trained using the ColossalAI framework for efficient distributed training. The models are further refined using the Tulu 3 dataset for supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to create chat and instruct variants. A reasoning model is developed using DeepScaleR and GRPO reinforcement learning techniques, leveraging datasets like OpenThoughts and OpenR1-Math-220k.
Quick Start & Requirements
transformers
library.
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
model_name = 'moxin-org/moxin-7b'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, device_map="auto")
convert_hf_to_gguf.py
script is provided.datasets
, PyTorch, CUDA, Flash Attention, and ColossalAI. Training scripts are available in scripts/
.Highlighted Details
lm-evaluation-harness
and OLMES, showing competitive performance against models like Mistral-7B and Llama 3.1-8B.hpcai-tech/Colossal-LLaMA-2-7b-base
.Maintenance & Community
moxin-org
organization.Licensing & Compatibility
Limitations & Caveats
1 month ago
Inactive