AI research project for efficient LLM training and inference
Top 59.3% on sourcepulse
This repository provides artifacts for efficiently training and inferencing large language models (LLMs), specifically optimized for Meta's Llama 3.1 405B. It targets enterprise AI use cases like SQL and code copilots, offering improved inference latency and throughput, and enabling parameter-efficient fine-tuning on single nodes.
How It Works
The project leverages a unique Dense-Mixture-of-Experts (MoE) hybrid transformer architecture, combining a 10B dense model with a 480B total parameter MoE MLP (17B active parameters). This approach aims for efficient intelligence, excelling in enterprise-specific tasks like SQL generation and coding, while using significantly less compute than comparable models.
Quick Start & Requirements
Snowflake/snowflake-arctic-instruct
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README focuses on Llama 3.1 405B optimization; support for other models or architectures is not detailed. Specific hardware requirements for optimal performance are not explicitly listed.
11 months ago
Inactive