JetMoE by myshell-ai

Open-sourced LLM reaching LLaMA2 performance with limited resources

Created 1 year ago

988 stars

Top 37.4% on SourcePulse

View on GitHub

6 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Jiaming Song

Chief Scientist at Luma AI

Thomas Wolf

Cofounder of Hugging Face

Jianwei Yang

Research Scientist at Meta Superintelligence Lab

and 2 more!

Project Summary

JetMoE offers an open-source Mixture-of-Experts (MoE) large language model, JetMoE-8B, designed to achieve performance comparable to larger models like LLaMA2-7B with significantly reduced training costs (under $0.1M) and inference computation. It targets researchers and developers seeking efficient, high-performing LLMs that can be fine-tuned on consumer-grade hardware.

How It Works

JetMoE-8B utilizes a Mixture-of-Experts architecture, activating only 2.2 billion parameters during inference. This sparse activation drastically lowers computational requirements compared to dense models of similar capabilities, enabling faster inference and more accessible fine-tuning. The model is trained on publicly available datasets, making it suitable for academic and open-source applications.

Quick Start & Requirements

Install via pip: pip install -e .
Load model using Hugging Face transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM
from jetmoe import JetMoEForCausalLM, JetMoEConfig

AutoConfig.register("jetmoe", JetMoEConfig)
AutoModelForCausalLM.register(JetMoEConfig, JetMoEForCausalLM)

tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')

Requires transformers library.

Highlighted Details

Outperforms LLaMA2-7B, LLaMA-13B, and DeepseekMoE-16B on Open LLM Leaderboard benchmarks.
Achieves a higher MT-Bench score (6.681) than Llama-2-13b-chat (6.650) and Llama-2-7b-chat (6.269).
Trained on 1.25T tokens.
Active parameters during inference: 2.2B.

Maintenance & Community

Project contributors include Yikang Shen, Zhen Guo, Tianle Cai, and Zengyi Qin.
Contact Yikang Shen for technical inquiries and Zengyi Qin for media/collaboration.
MyShell.ai actively supports high-quality open-source projects and welcomes collaborations.
Website: https://research.myshell.ai/jetmoe
HuggingFace: https://huggingface.co/jetmoe/jetmoe-8b
Online Demo: https://www.lepton.ai/playground/chat?model=jetmoe-8b-chat
Technical Report: https://arxiv.org/pdf/2404.07413.pdf

Licensing & Compatibility

The README states the model is "academia-friendly" and uses "public datasets." Specific license details are not explicitly stated in the provided text, but the open-source nature suggests permissive licensing.

Limitations & Caveats

While claiming to outperform LLaMA2-7B, JetMoE-8B shows lower scores on ARC (48.7 vs 53.1) and WinoGrande (70.2 vs 74) according to the provided benchmark table. Further investigation into specific benchmark methodologies is recommended.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days