Seed-Thinking-v1.5 by ByteDance-Seed

Reasoning model for STEM, coding, and general tasks

Created 9 months ago

816 stars

Top 43.4% on SourcePulse

2 Experts Love This Project

hiyouga

Author of LLaMA-Factory

shizhediao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

Seed-Thinking-v1.5 is a Mixture-of-Experts (MoE) large language model designed to enhance reasoning capabilities across STEM and coding domains. It aims to improve performance by incorporating a "think before responding" mechanism, making it suitable for researchers and developers seeking advanced reasoning models.

How It Works

Seed-Thinking-v1.5 employs a Mixture-of-Experts (MoE) architecture, featuring 20 billion activated parameters and 200 billion total parameters. This approach allows for efficient scaling and specialization of model components, contributing to its strong performance on complex reasoning tasks. The model's core innovation lies in its reinforcement learning-driven reasoning process, enabling it to "think" through problems before generating a response.

Quick Start & Requirements

Installation: Not explicitly detailed in the README.
Prerequisites: Requires significant computational resources due to its large parameter count (200B total, 20B activated). Specific hardware (e.g., GPUs with substantial VRAM) and software dependencies (e.g., PyTorch, CUDA) are implied but not listed.
Resources: Setup and inference will likely demand high-end GPU hardware and considerable memory.
Documentation: A technical report is referenced for full details.

Highlighted Details

Achieves 86.7% on AIME 2024 and 55.0% on Codeforces (pass@8).
Outperforms DeepSeek R1 by 8% in win rate on non-reasoning tasks.
Demonstrates strong performance on benchmarks like GPQA (77.3%) and MMLU-PRO (87.0%).
Introduces two internal benchmarks, BeyondAIME and Codeforces, for generalized reasoning assessment.

Maintenance & Community

Developed by ByteDance.
No specific community channels (Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The license is not specified in the provided README text.

Limitations & Caveats

The README indicates that internal sandbox results may differ from reported benchmarks due to testing environment inconsistencies.
Specific setup instructions, dependencies, and licensing information are not readily available, potentially hindering adoption.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

Ling-V2 by inclusionAI

Efficient MoE LLMs for advanced reasoning and high-speed generation

Created 4 months ago

Updated 3 months ago

Ape by weavel-ai

Open-source library for AI prompt engineering

Created 1 year ago

Updated 6 months ago

Starred by

Simon Willison

Simon Willison(Coauthor of Django).

XBai-o4 by MetaStone-AI

Advanced LLM for complex reasoning

Created 5 months ago

Updated 5 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Yiran Wu

Yiran Wu(Coauthor of AutoGen), and

2 more.

aimo-progress-prize by project-numina

Code for replicating a math problem-solving solution

Created 1 year ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

POLARIS by ChenxinAn-fdu

Scaling RL for advanced reasoning models

Created 6 months ago

Updated 2 months ago

Seed-Coder by ByteDance-Seed

Code LLM for code generation, completion, and reasoning tasks

Created 8 months ago

Updated 7 months ago

unlock-deepseek by datawhalechina

Educational resource for DeepSeek LLM

Created 11 months ago

Updated 9 months ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

5 more.

MobileLLM by facebookresearch

Sub-billion parameter LLM training code for on-device use

Created 1 year ago

Updated 8 months ago

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier) and

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

Tutel by microsoft

Optimized MoE library for modern training and inference

Created 4 years ago

Updated 3 weeks ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

Open-Reasoner-Zero by Open-Reasoner-Zero

Open-source RL training for scalable reasoning on base models

Created 10 months ago

Updated 7 months ago

train-deepseek-r1 by FareedKhan-dev

Replicate DeepSeek R1 LLM training from scratch

Created 11 months ago

Updated 9 months ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect) and

Alexander Borzunov

Alexander Borzunov(Research Scientist at OpenAI).

efficient-dl-systems by mryab

Course materials for efficient deep learning systems

Created 4 years ago

Updated 1 month ago

Feedback? Help us improve.