StableLM  by Stability-AI

Language models by Stability AI

created 2 years ago
15,825 stars

Top 3.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides Stability AI's StableLM series of language models, offering a range of sizes and fine-tuned versions for various applications. It targets researchers and developers looking for open-source LLMs, with models like StableLM-3B-4E1T and StableLM-Alpha v2 available, aiming to provide competitive performance with established models.

How It Works

StableLM models are decoder-only transformers, largely based on the LLaMA architecture. Key modifications include Rotary Position Embeddings applied to the first 25% of head embedding dimensions for improved throughput and LayerNorm with learned bias terms instead of RMSNorm. The models are trained on large, filtered datasets including Falcon RefinedWeb, RedPajama-Data, The Pile, and StarCoder, with specific versions trained on up to 4 trillion tokens across multiple epochs to study the impact of repeated data.

Quick Start & Requirements

  • Install/Run: Use Hugging Face transformers library.
  • Prerequisites: Python, PyTorch, transformers. GPU recommended for inference.
  • Demo: Hugging Face Spaces available for the 7B model.
  • Docs: Technical reports and configuration files are linked for detailed information.

Highlighted Details

  • StableLM-3B-4E1T achieves state-of-the-art performance at the 3B parameter scale and is competitive with many 7B models.
  • StableLM-Alpha v2 models incorporate architectural improvements like SwiGLU and use higher-quality data sources, extending context length to 4096 tokens.
  • StableVicuna-13B is an RLHF fine-tune of Vicuna-13B, aiming to be an open-source RLHF LLM Chatbot.
  • Examples demonstrate capabilities in chit-chat, formal writing, creative writing (rap battles, stories), and humor.

Maintenance & Community

  • The project is actively updated with new checkpoints.
  • Community involvement is encouraged via Discord for contributions and ideas.

Licensing & Compatibility

  • Base models (StableLM-Base-Alpha) are under CC BY-SA-4.0.
  • Fine-tuned models (StableLM-Tuned-Alpha, StableVicuna) are under CC BY-NC-SA-4.0 (Non-Commercial).
  • All code is under Apache License 2.0.
  • The non-commercial license for fine-tuned models restricts commercial use.

Limitations & Caveats

  • Fine-tuned models are explicitly licensed for non-commercial use.
  • As with any pre-trained LLM, responses may vary in quality and could include offensive content, though this is expected to improve with scale and feedback.
  • StableVicuna-13B delta weights require combining with the original LLaMA model due to its license.
Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
45 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
10 more.

TinyLlama by jzhang38

0.3%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Woosuk Kwon Woosuk Kwon(Author of vLLM), and
11 more.

WizardLM by nlpxucan

0.1%
9k
LLMs built using Evol-Instruct for complex instruction following
created 2 years ago
updated 1 month ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.