Home
Browse all repos
/
Discover and explore top open-source AI tools and projects—updated daily.
Home
Browse all repos
Home
>
Users
>
Ying Sheng
Ying Sheng
Coauthor of SGLang
GitHub
Starred Projects (131)
mini-sglang
by
sgl-project
1.2%
4k
Lightweight LLM inference framework with advanced optimizations
Starred by
Created 5 months ago
Updated 17 hours ago
TileRT
by
tile-ai
2.6%
655
Ultra-low-latency LLM inference runtime
Created 3 months ago
Updated 1 week ago
miles
by
radixark
2.6%
907
Enterprise RL for large-scale MoE models
Starred by
+4
Created 4 months ago
Updated 18 hours ago
SpecForge
by
sgl-project
0.7%
701
Train speculative decoding models for faster inference
Starred by
Created 8 months ago
Updated 22 hours ago
genai-bench
by
sgl-project
0.7%
271
LLM serving performance benchmarking
Starred by
Created 8 months ago
Updated 5 days ago
ome
by
sgl-project
0.8%
379
Kubernetes operator for LLM serving
Starred by
Created 9 months ago
Updated 18 hours ago
ChatLearn
by
alibaba
0%
450
Training framework for large-scale alignment tasks
Created 2 years ago
Updated 4 months ago
OpenRLHF
by
OpenRLHF
0.3%
9k
RLHF framework for scalable training of large language models
Starred by
+9
Created 2 years ago
Updated 4 days ago
verl
by
verl-project
0.5%
19k
RL training library for LLMs
Starred by
+14
Created 1 year ago
Updated 23 hours ago
how-to-optim-algorithm-in-cuda
by
BBuf
0.1%
3k
CUDA optimization guide for common algorithms
Starred by
Created 7 years ago
Updated 1 week ago
sgl-learning-materials
by
sgl-project
0.9%
753
Learning materials for SGLang, an efficient LLM serving engine
Starred by
Created 1 year ago
Updated 1 month ago
glake
by
antgroup
0.2%
498
GPU optimization library for memory management and IO
Created 2 years ago
Updated 11 months ago
xgrammar
by
mlc-ai
0.5%
2k
Library for efficient structured generation
Starred by
+4
Created 1 year ago
Updated 1 week ago
NeMo
by
NVIDIA-NeMo
0.2%
17k
Scalable generative AI framework for LLMs, multimodal, and speech AI research
Starred by
+15
Created 6 years ago
Updated 23 hours ago
Nanoflow
by
efeslab
0.2%
946
LLM serving framework for high throughput
Starred by
Created 1 year ago
Updated 3 months ago
InfiniteBench
by
OpenBMB
0.3%
377
Benchmark for evaluating language models on super-long contexts (100k+ tokens)
Starred by
Created 2 years ago
Updated 1 year ago
simple-evals
by
openai
0.1%
4k
Lightweight library for evaluating language models
Starred by
+14
Created 1 year ago
Updated 6 months ago
ScaleLLM
by
vectorch-ai
0%
492
LLM inference system for production environments
Created 2 years ago
Updated 2 months ago
mergekit
by
arcee-ai
0.3%
7k
CLI tool for merging pretrained language models, combining strengths without retraining
Starred by
+15
Created 2 years ago
Updated 1 month ago
RouteLLM
by
lm-sys
0.6%
5k
Framework for LLM routing and cost reduction (research paper)
Starred by
+7
Created 1 year ago
Updated 1 year ago
GPTQModel
by
ModelCloud
0.8%
1k
LLM compression toolkit for accelerated CPU/GPU inference
Starred by
Created 1 year ago
Updated 4 days ago
Mooncake
by
kvcache-ai
0.5%
5k
Research paper on a disaggregated architecture for LLM serving
Starred by
+2
Created 1 year ago
Updated 17 hours ago
SWE-bench
by
SWE-bench
0.9%
4k
Benchmark for evaluating LLMs on real-world GitHub issues
Starred by
+12
Created 2 years ago
Updated 6 days ago
inspect_ai
by
UKGovernmentBEIS
1.5%
2k
Framework for large language model evaluations
Starred by
+7
Created 2 years ago
Updated 1 day ago
Quest
by
mit-han-lab
0.3%
373
Inference framework for efficient long-context LLM inference
Created 1 year ago
Updated 7 months ago
dspy
by
stanfordnlp
0.4%
32k
Framework for programming language models, not prompting
Starred by
+49
Created 3 years ago
Updated 1 day ago
DoRA
by
NVlabs
0.3%
936
PyTorch code for weight-decomposed low-rank adaptation (DoRA)
Starred by
Created 1 year ago
Updated 1 year ago
RULER
by
NVIDIA
0.3%
1k
Evaluation suite for long-context language models research paper
Starred by
Created 1 year ago
Updated 3 months ago
mirage
by
mirage-project
0.4%
2k
Tool for fast GPU kernel generation via superoptimization
Starred by
+1
Created 1 year ago
Updated 2 days ago
beir
by
beir-cellar
0.5%
2k
IR benchmark for evaluating NLP retrieval models
Starred by
+1
Created 5 years ago
Updated 4 months ago
storm
by
stanford-oval
0.1%
28k
LLM system for automated knowledge curation and article generation
Starred by
+5
Created 1 year ago
Updated 4 months ago
llm.c
by
karpathy
0.1%
29k
LLM training in pure C/CUDA, no PyTorch needed
Starred by
+26
Created 1 year ago
Updated 8 months ago
LeetCUDA
by
xlite-dev
0.4%
10k
CUDA learning notes for beginners using PyTorch
Created 3 years ago
Updated 1 week ago
candle
by
huggingface
0.3%
19k
Minimalist ML framework for Rust, emphasizing performance and ease of use
Starred by
+23
Created 2 years ago
Updated 6 days ago
Open-Sora-Plan
by
PKU-YuanGroup
0.0%
12k
Open-source project aiming to reproduce Sora-like T2V model
Starred by
+2
Created 2 years ago
Updated 3 months ago
MiniGPT4-video
by
Vision-CAIR
0%
642
Video-language model for short and long video understanding
Created 1 year ago
Updated 1 year ago
adapters
by
adapter-hub
0%
3k
Unified library for parameter-efficient transfer learning in NLP
Starred by
+7
Created 5 years ago
Updated 4 months ago
lmdeploy
by
InternLM
0.2%
8k
Toolkit for LLM compression, deployment, and serving
Starred by
+8
Created 2 years ago
Updated 1 day ago
cutlass
by
NVIDIA
0.3%
9k
CUDA C++ and Python DSLs for high-performance linear algebra
Starred by
+21
Created 8 years ago
Updated 20 hours ago
guidance
by
guidance-ai
0.1%
21k
Guidance is a programming paradigm for steering LLMs
Starred by
+38
Created 3 years ago
Updated 1 week ago
trl
by
huggingface
0.3%
17k
Library for transformer RL
Starred by
+28
Created 6 years ago
Updated 17 hours ago
sglang
by
sgl-project
1.0%
24k
Fast serving framework for LLMs and vision language models
Starred by
+34
Created 2 years ago
Updated 17 hours ago
MultiPL-E
by
nuprl
0%
298
Benchmark for evaluating code generation LLMs across multiple programming languages
Created 3 years ago
Updated 4 weeks ago
rags
by
run-llama
0.1%
7k
Streamlit app for building RAG pipelines via natural language
Starred by
Created 2 years ago
Updated 1 year ago
llm-reasoners
by
maitrix-org
0.1%
2k
Library for advanced LLM reasoning with search algorithms
Starred by
+1
Created 2 years ago
Updated 8 months ago
ToolBench
by
OpenBMB
0.1%
6k
Open platform for LLM tool learning (ICLR'24 spotlight)
Starred by
+6
Created 2 years ago
Updated 9 months ago
autogen
by
microsoft
0.4%
55k
Agentic framework for multi-agent AI applications
Starred by
+19
Created 2 years ago
Updated 1 month ago
webarena
by
web-arena-x
0.5%
1k
Web environment for autonomous agent development
Starred by
Created 2 years ago
Updated 3 months ago
WebGLM
by
THUDM
0%
2k
Web-enhanced question answering system using a 10B GLM
Created 2 years ago
Updated 11 months ago
DeepSpeed-MII
by
deepspeedai
0%
2k
Python library for high-throughput, low-latency, and cost-effective model inference
Starred by
+5
Created 3 years ago
Updated 8 months ago
megablocks
by
databricks
0%
2k
Lightweight library for mixture-of-experts (MoE) training
Starred by
+15
Created 3 years ago
Updated 8 months ago
EAGLE
by
SafeAILab
0.4%
2k
Speculative decoding research paper for faster LLM inference
Starred by
+5
Created 2 years ago
Updated 5 days ago
gpt-fast
by
meta-pytorch
0.1%
6k
PyTorch text generation for efficient transformer inference
Starred by
+20
Created 2 years ago
Updated 6 months ago
VLM_survey
by
jingyi0000
0.1%
3k
VLM survey paper with links to models/methods for vision tasks
Created 2 years ago
Updated 4 months ago
dify
by
langgenius
0.4%
130k
Open-source LLM app development platform
Starred by
+17
Created 2 years ago
Updated 17 hours ago
LookaheadDecoding
by
hao-ai-lab
0%
1k
Parallel decoding algorithm for faster LLM inference
Starred by
Created 2 years ago
Updated 11 months ago
ARES
by
stanford-futuredata
0.1%
693
RAG evaluation framework
Starred by
Created 2 years ago
Updated 11 months ago
llm-analysis
by
cli99
0%
477
CLI tool for LLM latency/memory analysis during training/inference
Starred by
Created 2 years ago
Updated 10 months ago
Awesome-LLM-Inference
by
xlite-dev
0.3%
5k
Curated list of LLM/VLM inference research papers with code
Created 2 years ago
Updated 17 hours ago
gpt_paper_assistant
by
tatsu-lab
0%
541
ArXiv scanner using GPT-4 for personalized paper recommendations
Starred by
Created 2 years ago
Updated 1 year ago
MergeLM
by
yule-BUAA
0%
863
Codebase for merging language models via parameter averaging
Starred by
Created 2 years ago
Updated 1 year ago
ChatRTX
by
NVIDIA
0%
3k
Demo app for local RAG chatbot on Windows
Starred by
Created 2 years ago
Updated 1 month ago
DeepSeek-Coder
by
deepseek-ai
0.2%
23k
Code LLM for code completion and generation
Starred by
+4
Created 2 years ago
Updated 3 months ago
llm-decontaminator
by
lm-sys
0%
316
LLM contamination detector for quantifying rephrased samples
Starred by
Created 2 years ago
Updated 2 years ago
S-LoRA
by
S-LoRA
0%
2k
System for scalable LoRA adapter serving
Starred by
+1
Created 2 years ago
Updated 2 years ago
flashinfer
by
flashinfer-ai
0.8%
5k
Kernel library for LLM serving
Starred by
+12
Created 2 years ago
Updated 22 hours ago
skypilot
by
skypilot-org
0.3%
9k
Framework for cloud AI/batch jobs, unifying execution across diverse infrastructure
Starred by
+24
Created 4 years ago
Updated 17 hours ago
gpu_poor
by
RahulSChand
0.2%
1k
CLI tool for LLM memory and throughput estimation
Created 2 years ago
Updated 1 year ago
ggml
by
ggml-org
1.3%
14k
Tensor library for machine learning
Starred by
+16
Created 3 years ago
Updated 17 hours ago
tensorrtllm_backend
by
triton-inference-server
0.4%
924
Triton backend for serving TensorRT-LLM models
Starred by
Created 2 years ago
Updated 5 days ago
TensorRT-LLM
by
NVIDIA
0.3%
13k
LLM inference optimization SDK for NVIDIA GPUs
Starred by
+18
Created 2 years ago
Updated 17 hours ago
leptonai
by
leptonai
0%
3k
Python framework for simplifying AI service building
Starred by
+1
Created 2 years ago
Updated 3 weeks ago
punica
by
punica-ai
0.3%
1k
LoRA serving system (research paper) for multi-tenant LLM inference
Starred by
+3
Created 2 years ago
Updated 1 year ago
modular
by
modular
0.2%
26k
AI toolchain unifying fragmented AI deployment workflows
Starred by
+10
Created 2 years ago
Updated 1 day ago
llm-numbers
by
ray-project
0.1%
4k
LLM developer's reference for key numbers
Starred by
+8
Created 2 years ago
Updated 2 years ago
LLaMA2-Accessory
by
Alpha-VLLM
0.0%
3k
Open-source toolkit for LLM development, pretraining, finetuning, and deployment
Starred by
+2
Created 2 years ago
Updated 1 year ago
rerope
by
bojone
0%
389
Position embeddings research paper
Starred by
Created 2 years ago
Updated 1 year ago
LightLLM
by
ModelTC
0.3%
4k
Python framework for LLM inference and serving
Starred by
+6
Created 2 years ago
Updated 5 days ago
GPTCache
by
zilliztech
0.1%
8k
Semantic cache for LLM queries, integrated with LangChain and LlamaIndex
Starred by
+8
Created 2 years ago
Updated 7 months ago
llama
by
meta-llama
0.1%
59k
Inference code for Llama 2 models (deprecated)
Starred by
+38
Created 3 years ago
Updated 1 year ago
LLM-Training-Puzzles
by
srush
0.1%
1k
Hands-on puzzles for large language model training
Starred by
+8
Created 2 years ago
Updated 2 years ago
ringattention
by
haoliuhl
0.1%
769
Jax implementation of RingAttention for large context models (research paper)
Starred by
Created 2 years ago
Updated 4 months ago
metal-flash-attention
by
philipturner
1.0%
586
Metal port of FlashAttention for Apple silicon
Starred by
+2
Created 2 years ago
Updated 1 year ago
companion-app
by
a16z-infra
0.1%
6k
AI companion stack for personalized chatbots
Starred by
+6
Created 2 years ago
Updated 1 year ago
bitsandbytes
by
bitsandbytes-foundation
0.3%
8k
PyTorch library for k-bit quantization, enabling accessible LLMs
Starred by
+27
Created 4 years ago
Updated 1 day ago
long_llama
by
CStanKonrad
0%
1k
LLM for long context handling, fine-tuned with Focused Transformer
Starred by
Created 2 years ago
Updated 2 years ago
scalene
by
plasma-umass
0.1%
13k
Python profiler with AI-powered optimization proposals
Starred by
+14
Created 6 years ago
Updated 2 days ago
LLMSurvey
by
RUCAIBox
0.0%
12k
Survey paper for large language models
Starred by
+2
Created 2 years ago
Updated 11 months ago
Awesome-LLM-Compression
by
HuangOwen
0.3%
2k
LLM compression papers and tools for efficient training/inference
Created 2 years ago
Updated 2 days ago
fastllm
by
ztxz16
0.1%
4k
High-performance C++ LLM inference library
Starred by
Created 2 years ago
Updated 17 hours ago
H2O
by
FMInference
0%
504
KV cache eviction research paper for efficient LLM inference
Starred by
Created 2 years ago
Updated 1 year ago
LongChat
by
DachengLi1
0%
534
Long-context LLM chatbot training and evaluation framework
Starred by
+2
Created 2 years ago
Updated 1 year ago
xgen
by
salesforce
0.1%
725
LLM research release with 8k sequence length
Starred by
Created 2 years ago
Updated 1 year ago
flexflow-train
by
flexflow
0.1%
2k
Accelerating distributed deep learning training
Starred by
+8
Created 7 years ago
Updated 20 hours ago
peft
by
huggingface
0.1%
21k
Parameter-efficient fine-tuning (PEFT) library
Starred by
+16
Created 3 years ago
Updated 1 day ago
AutoGPT
by
Significant-Gravitas
0.1%
182k
AI agent platform for building, deploying, and running autonomous workflows
Starred by
+56
Created 2 years ago
Updated 17 hours ago
CTranslate2
by
OpenNMT
0.3%
4k
Fast inference engine for Transformer models
Starred by
+6
Created 6 years ago
Updated 3 weeks ago
text-generation-inference
by
huggingface
0.1%
11k
Rust/Python/gRPC server for fast LLM text generation
Starred by
+35
Created 3 years ago
Updated 1 month ago
mlc-llm
by
mlc-ai
0.1%
22k
Universal LLM deployment engine with ML compilation
Starred by
+21
Created 2 years ago
Updated 2 days ago
awesome-mixture-of-experts
by
XueFuzhao
0%
1k
Curated list of resources for mixture-of-experts (MoE) research
Created 3 years ago
Updated 1 year ago
InternLM-techreport
by
InternLM
0%
902
Multilingual LLM research paper with 104B parameters
Starred by
Created 2 years ago
Updated 2 years ago
awesome-chatgpt-dataset
by
voidful
0.1%
761
Dataset repo for LLM training
Starred by
Created 2 years ago
Updated 4 months ago
RWKV-LM
by
BlinkDL
0.2%
14k
RNN for LLM, transformer-level performance, parallelizable training
Starred by
+29
Created 4 years ago
Updated 4 days ago
qdrant
by
qdrant
0.7%
29k
Vector database for similarity search in AI applications
Starred by
+22
Created 5 years ago
Updated 17 hours ago
alpaca-lora
by
tloen
0.0%
19k
LoRA fine-tuning for LLaMA
Starred by
+22
Created 3 years ago
Updated 1 year ago
trlx
by
CarperAI
0.0%
5k
Distributed RLHF for LLMs
Starred by
+16
Created 3 years ago
Updated 2 years ago
MOSS
by
OpenMOSS
0.0%
12k
Open-source tool-augmented conversational language model
Starred by
+2
Created 2 years ago
Updated 1 year ago
CodeGeeX
by
zai-org
0.1%
9k
Code generation model for multilingual programming
Created 3 years ago
Updated 1 year ago
baize-chatbot
by
project-baize
0%
3k
Chat model trained via LoRA, using ChatGPT-generated dialogs
Starred by
+3
Created 2 years ago
Updated 1 year ago
ChatGDB
by
pgosar
0%
937
CLI tool for debugging with natural language via LLM
Starred by
Created 2 years ago
Updated 1 year ago
ByteTransformer
by
bytedance
0.2%
477
High-performance BERT transformer inference on NVIDIA GPUs
Created 3 years ago
Updated 1 year ago
FastChat
by
lm-sys
0.0%
39k
Open platform for training, serving, and evaluating LLM-based chatbots
Starred by
+36
Created 2 years ago
Updated 8 months ago
fastertransformer_backend
by
triton-inference-server
0%
413
Triton backend for optimized transformer inference
Created 4 years ago
Updated 2 years ago
LMFlow
by
OptimalScale
0.0%
8k
Toolkit for finetuning and inference of large foundation models
Starred by
+9
Created 2 years ago
Updated 1 week ago
langchain
by
langchain-ai
0.4%
127k
Framework for building LLM-powered applications
Starred by
+83
Created 3 years ago
Updated 17 hours ago
llama_index
by
run-llama
0.3%
47k
Data framework for building LLM-powered agents
Starred by
+44
Created 3 years ago
Updated 1 day ago
FlexLLMGen
by
FMInference
0.0%
9k
High-throughput generation engine for LLMs with limited GPU memory
Starred by
+20
Created 3 years ago
Updated 1 year ago
PiPPy
by
pytorch
0%
785
PyTorch tool for pipeline parallelism
Starred by
+3
Created 4 years ago
Updated 1 year ago
best_AI_papers_2022
by
louisfb01
0%
3k
AI paper list (2022) with video explanations and code
Starred by
Created 4 years ago
Updated 2 years ago
metaseq
by
facebookresearch
0%
7k
Codebase for large-scale transformer model development and deployment
Starred by
+11
Created 3 years ago
Updated 1 year ago
dpm-solver
by
LuChengTHU
0.1%
2k
Fast ODE solver for diffusion probabilistic model sampling
Starred by
Created 3 years ago
Updated 2 years ago
FasterTransformer
by
NVIDIA
0.0%
6k
Optimized transformer library for inference
Starred by
+12
Created 4 years ago
Updated 1 year ago
CodeGen
by
salesforce
0%
5k
Open-source model family for program synthesis
Starred by
+8
Created 3 years ago
Updated 4 months ago
GLM-130B
by
zai-org
0%
8k
Bilingual model for research and evaluation
Starred by
+6
Created 3 years ago
Updated 2 years ago
Megatron-LM
by
NVIDIA
0.3%
15k
Framework for training transformer models at scale
Starred by
+20
Created 7 years ago
Updated 17 hours ago
llm-seminar
by
craffel
0%
314
Course reading list for large language models
Starred by
Created 3 years ago
Updated 3 years ago
paper-reading
by
mli
0.1%
33k
Deep learning paper readings
Starred by
+2
Created 4 years ago
Updated 11 months ago
CodeT5
by
salesforce
0.1%
3k
Code LLMs for code understanding and generation research
Starred by
+3
Created 4 years ago
Updated 2 years ago
awesome-tensor-compilers
by
merrymercy
0%
3k
Curated list of tensor compiler projects and papers
Starred by
+10
Created 5 years ago
Updated 1 year ago
Dive-into-DL-PyTorch
by
ShusenTang
0.0%
19k
PyTorch rewrite of "Dive into Deep Learning" book
Created 7 years ago
Updated 4 years ago
alpa
by
alpa-projects
0.1%
3k
Auto-parallelization framework for large-scale neural network training and serving
Starred by
+17
Created 5 years ago
Updated 2 years ago
Feedback? Help us improve.