StableLM  by Stability-AI

Language models by Stability AI

Created 2 years ago
15,809 stars

Top 3.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides Stability AI's StableLM series of language models, offering a range of sizes and fine-tuned versions for various applications. It targets researchers and developers looking for open-source LLMs, with models like StableLM-3B-4E1T and StableLM-Alpha v2 available, aiming to provide competitive performance with established models.

How It Works

StableLM models are decoder-only transformers, largely based on the LLaMA architecture. Key modifications include Rotary Position Embeddings applied to the first 25% of head embedding dimensions for improved throughput and LayerNorm with learned bias terms instead of RMSNorm. The models are trained on large, filtered datasets including Falcon RefinedWeb, RedPajama-Data, The Pile, and StarCoder, with specific versions trained on up to 4 trillion tokens across multiple epochs to study the impact of repeated data.

Quick Start & Requirements

  • Install/Run: Use Hugging Face transformers library.
  • Prerequisites: Python, PyTorch, transformers. GPU recommended for inference.
  • Demo: Hugging Face Spaces available for the 7B model.
  • Docs: Technical reports and configuration files are linked for detailed information.

Highlighted Details

  • StableLM-3B-4E1T achieves state-of-the-art performance at the 3B parameter scale and is competitive with many 7B models.
  • StableLM-Alpha v2 models incorporate architectural improvements like SwiGLU and use higher-quality data sources, extending context length to 4096 tokens.
  • StableVicuna-13B is an RLHF fine-tune of Vicuna-13B, aiming to be an open-source RLHF LLM Chatbot.
  • Examples demonstrate capabilities in chit-chat, formal writing, creative writing (rap battles, stories), and humor.

Maintenance & Community

  • The project is actively updated with new checkpoints.
  • Community involvement is encouraged via Discord for contributions and ideas.

Licensing & Compatibility

  • Base models (StableLM-Base-Alpha) are under CC BY-SA-4.0.
  • Fine-tuned models (StableLM-Tuned-Alpha, StableVicuna) are under CC BY-NC-SA-4.0 (Non-Commercial).
  • All code is under Apache License 2.0.
  • The non-commercial license for fine-tuned models restricts commercial use.

Limitations & Caveats

  • Fine-tuned models are explicitly licensed for non-commercial use.
  • As with any pre-trained LLM, responses may vary in quality and could include offensive content, though this is expected to improve with scale and feedback.
  • StableVicuna-13B delta weights require combining with the original LLaMA model due to its license.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Binyuan Hui Binyuan Hui(Research Scientist at Alibaba Qwen), and
3 more.

xgen by salesforce

0.1%
723
LLM research release with 8k sequence length
Created 2 years ago
Updated 7 months ago
Feedback? Help us improve.