snowflake-arctic  by Snowflake-Labs

AI research project for efficient LLM training and inference

created 1 year ago
546 stars

Top 59.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides artifacts for efficiently training and inferencing large language models (LLMs), specifically optimized for Meta's Llama 3.1 405B. It targets enterprise AI use cases like SQL and code copilots, offering improved inference latency and throughput, and enabling parameter-efficient fine-tuning on single nodes.

How It Works

The project leverages a unique Dense-Mixture-of-Experts (MoE) hybrid transformer architecture, combining a 10B dense model with a 480B total parameter MoE MLP (17B active parameters). This approach aims for efficient intelligence, excelling in enterprise-specific tasks like SQL generation and coding, while using significantly less compute than comparable models.

Quick Start & Requirements

  • Inference: Tutorials available for basic Hugging Face setup and vLLM deployment. Model weights are hosted on Hugging Face (e.g., Snowflake/snowflake-arctic-instruct).
  • Fine-tuning: Supports single and multi-node training using parameter-efficient techniques, FP8 quantization, and ZeRO-3-inspired sharding.
  • Prerequisites: Specific hardware requirements (e.g., GPU, CUDA) are not explicitly detailed in the README but are implied for LLM operations.

Highlighted Details

  • Optimized inference for Llama 3.1 405B with up to 3x lower latency and 1.4x higher throughput.
  • Supports a 128K context window for inference.
  • Achieves comparable or better enterprise intelligence metrics (SQL, code, instruction following) than Llama 3 8B and Llama 2 70B using less training compute.
  • Dense-MoE hybrid architecture with 480B total parameters (17B active).

Maintenance & Community

  • Developed by the Snowflake AI Research team.
  • Collaborations mentioned with DeepSpeed, Hugging Face, and vLLM.
  • Ongoing cookbook releases planned for deeper insights into training and data.

Licensing & Compatibility

  • Released under the Apache 2.0 license, allowing ungated access to weights and code.
  • Compatible with commercial use and closed-source linking due to the permissive license.

Limitations & Caveats

The README focuses on Llama 3.1 405B optimization; support for other models or architectures is not detailed. Specific hardware requirements for optimal performance are not explicitly listed.

Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), and
1 more.

JetMoE by myshell-ai

0.6%
989
Open-sourced LLM reaching LLaMA2 performance with limited resources
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.