JetMoE  by myshell-ai

Open-sourced LLM reaching LLaMA2 performance with limited resources

created 1 year ago
989 stars

Top 38.2% on sourcepulse

GitHubView on GitHub
Project Summary

JetMoE offers an open-source Mixture-of-Experts (MoE) large language model, JetMoE-8B, designed to achieve performance comparable to larger models like LLaMA2-7B with significantly reduced training costs (under $0.1M) and inference computation. It targets researchers and developers seeking efficient, high-performing LLMs that can be fine-tuned on consumer-grade hardware.

How It Works

JetMoE-8B utilizes a Mixture-of-Experts architecture, activating only 2.2 billion parameters during inference. This sparse activation drastically lowers computational requirements compared to dense models of similar capabilities, enabling faster inference and more accessible fine-tuning. The model is trained on publicly available datasets, making it suitable for academic and open-source applications.

Quick Start & Requirements

  • Install via pip: pip install -e .
  • Load model using Hugging Face transformers:
from transformers import AutoTokenizer, AutoModelForCausalLM
from jetmoe import JetMoEForCausalLM, JetMoEConfig

AutoConfig.register("jetmoe", JetMoEConfig)
AutoModelForCausalLM.register(JetMoEConfig, JetMoEForCausalLM)

tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')
  • Requires transformers library.

Highlighted Details

  • Outperforms LLaMA2-7B, LLaMA-13B, and DeepseekMoE-16B on Open LLM Leaderboard benchmarks.
  • Achieves a higher MT-Bench score (6.681) than Llama-2-13b-chat (6.650) and Llama-2-7b-chat (6.269).
  • Trained on 1.25T tokens.
  • Active parameters during inference: 2.2B.

Maintenance & Community

Licensing & Compatibility

  • The README states the model is "academia-friendly" and uses "public datasets." Specific license details are not explicitly stated in the provided text, but the open-source nature suggests permissive licensing.

Limitations & Caveats

  • While claiming to outperform LLaMA2-7B, JetMoE-8B shows lower scores on ARC (48.7 vs 53.1) and WinoGrande (70.2 vs 74) according to the provided benchmark table. Further investigation into specific benchmark methodologies is recommended.
Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.