Seed-Coder by ByteDance-Seed

Code LLM for code generation, completion, and reasoning tasks

Created 8 months ago

720 stars

Top 47.8% on SourcePulse

Project Summary

Seed-Coder is a family of 8B parameter code LLMs (base, instruct, reasoning) from ByteDance Seed, designed to enhance coding capabilities by using LLMs to curate their own training data, minimizing human effort. It targets developers and researchers seeking powerful, lightweight, open-source code intelligence solutions.

How It Works

Seed-Coder employs a "model-centric" data pipeline, leveraging LLMs for data filtering and curation from sources like GitHub, commits, and web data. This approach aims to reduce manual effort in pretraining data construction while achieving state-of-the-art performance for its size.

Quick Start & Requirements

Install/Run: Deployable via Hugging Face transformers or vLLM.
Prerequisites: torch (bfloat16 recommended), transformers, vLLM for advanced deployment.
Resources: 8B parameter models; vLLM supports multi-GPU and tensor parallelism for long contexts (up to 32K tokens).
Links: Homepage, Hugging Face

Highlighted Details

State-of-the-art performance among open-source models at the 8B scale across various coding tasks.
Models include Base, Instruct (for user intent alignment), and Reasoning (RL-trained).
Supports long context windows up to 64K tokens for the Reasoning model.
Fully compatible with vLLM for efficient inference and distributed serving.

Maintenance & Community

Developed by the ByteDance Seed Team, founded in 2023.
Models are publicly available on Hugging Face.

Licensing & Compatibility

MIT License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

Evaluation results for BigCodeBench were updated due to an inconsistent setting; users should refer to the latest reported benchmarks.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

24 stars in the last 30 days

Explore Similar Projects

EXAONE-Deep by LG-AI-EXAONE

EXAONE Deep: Reasoning-focused language models (2.4B-32B params)

Created 10 months ago

Updated 7 months ago

Starred by

Ying Sheng

Ying Sheng(Coauthor of SGLang).

ScaleLLM by vectorch-ai

LLM inference system for production environments

Created 2 years ago

Updated 3 weeks ago

bce-qianfan-sdk by baidubce

SDK for Baidu's Qianfan LLM platform, enabling AI workflows

Created 2 years ago

Updated 1 month ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect) and

Akshat Bubna

Akshat Bubna(Cofounder of Modal).

cwm by facebookresearch

LLM for advanced code generation and reasoning

Created 4 months ago

Updated 2 weeks ago

Starred by

Anton Osika

Anton Osika(Cofounder of Lovable),

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai), and

2 more.

CodeTF by salesforce

Transformer library for code LLMs and code intelligence tasks

Created 2 years ago

Updated 8 months ago

Starred by

Thomas Wolf

Thomas Wolf(Cofounder of Hugging Face),

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind), and

1 more.

starcoder2 by bigcode-project

Code generation model family (3B, 7B, 15B) for code completion

Created 2 years ago

Updated 1 year ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen) and

Meng Zhang

Meng Zhang(Cofounder of TabbyML).

Awesome-Code-LLM by codefuse-ai

Curated list of code LLM research, plus datasets

Created 2 years ago

Updated 1 week ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and

8 more.

EAGLE by SafeAILab

Speculative decoding research paper for faster LLM inference

Created 2 years ago

Updated 3 weeks ago

Starred by

Nat Friedman

Nat Friedman(Former CEO of GitHub),

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory), and

11 more.

CodeGen by salesforce

Open-source model family for program synthesis

Created 3 years ago

Updated 2 months ago

Starred by

Jason Huggins

Jason Huggins(Creator of Selenium),

Victor Taelin

Victor Taelin(Author of Bend, Kind, HVM), and

11 more.

Qwen3-Coder by QwenLM

Code LLM for code completion, generation, and assistant use cases

Created 1 year ago

Updated 1 month ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect), and

15 more.

codellama by meta-llama

Inference code for CodeLlama models

Created 2 years ago

Updated 1 year ago

Starred by

Carol Willing

Carol Willing(Core Contributor to CPython, Jupyter),

Travis Fischer

Travis Fischer(Founder of Agentic), and

7 more.

DeepSeek-Coder by deepseek-ai

Code LLM for code completion and generation

Created 2 years ago

Updated 2 months ago

Feedback? Help us improve.