snowflake-arctic by Snowflake-Labs

AI research project for efficient LLM training and inference

Created 1 year ago

557 stars

Top 57.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Elvis Saravia

Founder of DAIR.AI

Project Summary

This repository provides artifacts for efficiently training and inferencing large language models (LLMs), specifically optimized for Meta's Llama 3.1 405B. It targets enterprise AI use cases like SQL and code copilots, offering improved inference latency and throughput, and enabling parameter-efficient fine-tuning on single nodes.

How It Works

The project leverages a unique Dense-Mixture-of-Experts (MoE) hybrid transformer architecture, combining a 10B dense model with a 480B total parameter MoE MLP (17B active parameters). This approach aims for efficient intelligence, excelling in enterprise-specific tasks like SQL generation and coding, while using significantly less compute than comparable models.

Quick Start & Requirements

Inference: Tutorials available for basic Hugging Face setup and vLLM deployment. Model weights are hosted on Hugging Face (e.g., Snowflake/snowflake-arctic-instruct).
Fine-tuning: Supports single and multi-node training using parameter-efficient techniques, FP8 quantization, and ZeRO-3-inspired sharding.
Prerequisites: Specific hardware requirements (e.g., GPU, CUDA) are not explicitly detailed in the README but are implied for LLM operations.

Highlighted Details

Optimized inference for Llama 3.1 405B with up to 3x lower latency and 1.4x higher throughput.
Supports a 128K context window for inference.
Achieves comparable or better enterprise intelligence metrics (SQL, code, instruction following) than Llama 3 8B and Llama 2 70B using less training compute.
Dense-MoE hybrid architecture with 480B total parameters (17B active).

Maintenance & Community

Developed by the Snowflake AI Research team.
Collaborations mentioned with DeepSpeed, Hugging Face, and vLLM.
Ongoing cookbook releases planned for deeper insights into training and data.

Licensing & Compatibility

Released under the Apache 2.0 license, allowing ungated access to weights and code.
Compatible with commercial use and closed-source linking due to the permissive license.

Limitations & Caveats

The README focuses on Llama 3.1 405B optimization; support for other models or architectures is not detailed. Specific hardware requirements for optimal performance are not explicitly listed.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days