seed-oss  by ByteDance-Seed

Large language models for long context and reasoning

Created 1 month ago
797 stars

Top 44.2% on SourcePulse

GitHubView on GitHub
Project Summary

Seed-OSS is a series of open-source large language models developed by ByteDance's Seed Team, offering powerful long-context, reasoning, agent, and general capabilities. Optimized for international use cases, it provides flexible control over the "thinking budget" for dynamic reasoning length adjustment, enhanced reasoning and agentic intelligence, and native support for up to 512K context length. The models are released under the Apache-2.0 license, making them suitable for both research and development.

How It Works

Seed-OSS models utilize a causal language model architecture incorporating RoPE, GQA attention, RMSNorm, and SwiGLU activation. A key feature is the "thinking budget," allowing users to control the model's reasoning depth. The model can dynamically adjust its chain-of-thought (CoT) process, reflecting on token usage and remaining budget, which can improve efficiency and performance on complex tasks.

Quick Start & Requirements

  • Installation: Install via pip install -r requirements.txt and pip install git+ssh://git@github.com/Fazziekey/transformers.git@seed-oss.
  • Prerequisites: Python, transformers library. GPU with sufficient VRAM is recommended for optimal performance.
  • Inference: Use the provided generate.py script or vLLM for faster inference. Quantization options (4-bit, 8-bit) are available to reduce memory usage.
  • Docs: Links to MODEL_CARD and inference scripts are available in the repository.

Highlighted Details

  • Achieves strong performance on various benchmarks despite training on 12T tokens.
  • Offers both base models (with and without synthetic data) and an instruct-tuned version.
  • Supports native long context up to 512K tokens.
  • Features a novel "thinking budget" mechanism for controllable reasoning.

Maintenance & Community

The project is developed by the ByteDance Seed Team, founded in 2023 with a focus on advanced AI foundation models. Further community engagement details (e.g., Discord/Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

  • License: Apache-2.0.
  • Compatibility: The Apache-2.0 license generally permits commercial use and linking with closed-source projects.

Limitations & Caveats

The README notes that for optimal performance with the "thinking budget" feature, users are advised to use values that are integer multiples of 512. Some benchmark results are presented as reproduced or reported, with specific details on evaluation configurations provided for certain tasks.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
17
Star History
805 stars in the last 30 days

Explore Similar Projects

Starred by Didier Lopes Didier Lopes(Founder of OpenBB), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

DeepSeek-Coder-V2 by deepseek-ai

0.3%
6k
Open-source code language model comparable to GPT4-Turbo
Created 1 year ago
Updated 11 months ago
Feedback? Help us improve.