X
2,142
Home
Browse all repos
/
Discover and explore top open-source AI tools and projects—updated daily.
Home
Browse all repos
Home
>
Users
>
Luca Soldaini
Luca Soldaini
Research Scientist at Ai2
GitHub
X
Starred Projects (95)
verifiers
by
willccbb
55.3%
3k
RL for LLMs in verifiable environments
Starred by
+9
Created 7 months ago
Updated 1 day ago
ProX
by
GAIR-NLP
0.4%
259
Data refinement framework for improving pre-training data quality
Created 11 months ago
Updated 1 month ago
system-prompts-and-models-of-ai-tools
by
x1xhlol
1.9%
80k
AI tool system prompts and models
Starred by
+7
Created 5 months ago
Updated 21 hours ago
cua
by
trycua
0.8%
9k
AI agent framework for computer OS control in virtual containers
Starred by
+5
Created 7 months ago
Updated 1 day ago
OLMo-core
by
allenai
1.1%
277
PyTorch building blocks for large language model training and inference
Starred by
Created 1 year ago
Updated 1 day ago
llm-datasets
by
mlabonne
0.5%
3k
Curated datasets/tools for LLM post-training
Starred by
+1
Created 1 year ago
Updated 1 month ago
webdataset
by
webdataset
0.4%
3k
High-performance I/O system for large deep learning problems, strong PyTorch support
Starred by
+13
Created 6 years ago
Updated 2 months ago
olmocr
by
allenai
0.5%
14k
Toolkit for linearizing PDFs for LLM datasets/training
Starred by
Created 11 months ago
Updated 1 day ago
LLM.swift
by
eastriverlee
1.0%
711
Swift SDK for local LLM interaction on Apple platforms
Starred by
Created 1 year ago
Updated 2 weeks ago
SpeziLLM
by
StanfordSpezi
0.8%
258
LLM integration for Swift applications
Starred by
Created 2 years ago
Updated 2 days ago
awesome-model-based-RL
by
opendilab
0.2%
1k
Curated list of model-based RL resources
Created 3 years ago
Updated 1 day ago
semchunk
by
isaacus-dev
0.8%
362
Python library for splitting text into semantically meaningful chunks
Created 1 year ago
Updated 2 weeks ago
torchtitan
by
pytorch
0.9%
4k
PyTorch platform for generative AI model training research
Starred by
+9
Created 1 year ago
Updated 22 hours ago
chat_templates
by
chujiezheng
0.4%
694
Chat templates for HuggingFace LLMs
Starred by
Created 1 year ago
Updated 8 months ago
DataDreamer
by
datadreamer-dev
0.4%
1k
Python library for synthetic data generation and training workflows
Starred by
+1
Created 2 years ago
Updated 6 months ago
MAP-NEO
by
multimodal-art-projection
0%
960
Open-source LLM with pretraining data, pipeline, scripts, and alignment code
Starred by
Created 1 year ago
Updated 6 months ago
sglang
by
sgl-project
1.6%
17k
Fast serving framework for LLMs and vision language models
Starred by
+32
Created 1 year ago
Updated 22 hours ago
cosmopedia
by
huggingface
0%
536
Synthetic dataset for LLM training
Starred by
Created 1 year ago
Updated 9 months ago
distilabel
by
argilla-io
0.4%
3k
Framework for synthetic data and AI feedback pipelines
Starred by
+11
Created 1 year ago
Updated 4 days ago
wtpsplit
by
segment-any-text
0.4%
1k
Text segmentation toolkit for robust sentence splitting
Starred by
Created 5 years ago
Updated 2 months ago
outlines
by
dottxt-ai
0.4%
12k
SDK for structured LLM text generation
Starred by
+34
Created 2 years ago
Updated 1 day ago
domains
by
tb0hdan
0.3%
801
Internet domains dataset for battling phishing attacks and research
Created 5 years ago
Updated 1 month ago
marker
by
datalab-to
0.7%
28k
CLI tool for converting PDFs and other documents to Markdown, JSON, and HTML
Starred by
+14
Created 1 year ago
Updated 1 day ago
OLMo-Eval
by
allenai
0%
359
Evaluation suite for LLMs
Created 1 year ago
Updated 1 month ago
datatrove
by
huggingface
0.3%
3k
Data processing library for large-scale text data
Starred by
+8
Created 2 years ago
Updated 3 days ago
reward-bench
by
allenai
0%
628
Reward model evaluation tool
Starred by
Created 1 year ago
Updated 2 months ago
mlx
by
ml-explore
0.3%
22k
Array framework for machine learning on Apple silicon
Starred by
+35
Created 1 year ago
Updated 1 day ago
InternLM
by
InternLM
0.1%
7k
LLM series (InternLM, InternLM2, InternLM2.5, InternLM3) official release
Starred by
+4
Created 2 years ago
Updated 1 month ago
gpt_academic
by
binary-husky
0.1%
69k
LLM tool for paper reading/polishing/writing, optimized UI
Starred by
+2
Created 2 years ago
Updated 5 days ago
gpt_paper_assistant
by
tatsu-lab
0.6%
536
ArXiv scanner using GPT-4 for personalized paper recommendations
Starred by
Created 1 year ago
Updated 1 year ago
dolma
by
allenai
0.1%
1k
Toolkit for curating datasets for language model pre-training
Starred by
+1
Created 2 years ago
Updated 1 week ago
MiniChain
by
srush
0%
1k
Tiny library for coding with large language models
Starred by
+7
Created 2 years ago
Updated 1 year ago
falcontune
by
rmihaylov
0%
466
CLI tool for finetuning Falcon LLMs
Starred by
Created 2 years ago
Updated 2 years ago
NeMo
by
NVIDIA-NeMo
0.5%
16k
Scalable generative AI framework for LLMs, multimodal, and speech AI research
Starred by
+13
Created 6 years ago
Updated 1 day ago
guidance
by
guidance-ai
0.1%
21k
Guidance is a programming paradigm for steering LLMs
Starred by
+38
Created 2 years ago
Updated 3 days ago
hh-rlhf
by
anthropics
0.2%
2k
RLHF dataset for training safe AI assistants
Starred by
+4
Created 3 years ago
Updated 2 months ago
self-instruct
by
yizhongw
0.1%
4k
Self-Instruct: Research paper for aligning language models with self-generated instructions
Starred by
+3
Created 2 years ago
Updated 2 years ago
gpt4all
by
nomic-ai
0.1%
77k
Desktop app for local LLM inference, no GPU/API needed
Starred by
+29
Created 2 years ago
Updated 3 months ago
garak
by
NVIDIA
1.9%
5k
LLM vulnerability scanner for red-teaming and security assessments
Starred by
+3
Created 2 years ago
Updated 2 days ago
awesome-instruction-learning
by
RenzeLou
0%
499
Curated list of instruction tuning/following papers and datasets
Starred by
Created 2 years ago
Updated 1 year ago
docquery
by
impira
0.1%
2k
Document query engine for extracting information from documents
Starred by
Created 3 years ago
Updated 2 years ago
pyllms
by
kagisearch
0%
791
Python SDK for LLM access and benchmarking
Starred by
+2
Created 2 years ago
Updated 2 weeks ago
dspy
by
stanfordnlp
0.7%
28k
Framework for programming language models, not prompting
Starred by
+49
Created 2 years ago
Updated 23 hours ago
OLMo
by
allenai
0.3%
6k
Open language model code for training, evaluation, and inference
Starred by
+4
Created 2 years ago
Updated 2 days ago
instruction-datasets
by
raunak-agarwal
0%
255
Dataset list for instruction tuning of LLMs
Starred by
Created 2 years ago
Updated 1 year ago
GPTQ-for-LLaMa
by
qwopqwop200
0%
3k
4-bit quantization for LLaMA models using GPTQ
Starred by
+2
Created 2 years ago
Updated 1 year ago
openai-cookbook
by
openai
0.3%
68k
Examples for using the OpenAI API
Starred by
+20
Created 3 years ago
Updated 1 day ago
transformers-bloom-inference
by
huggingface
0%
564
Inference solutions for BLOOM models
Starred by
Created 3 years ago
Updated 10 months ago
llama
by
meta-llama
0.1%
59k
Inference code for Llama 2 models (deprecated)
Starred by
+36
Created 2 years ago
Updated 7 months ago
composer
by
mosaicml
0.1%
5k
DL framework for training at scale, optimized for large-scale clusters
Starred by
+16
Created 3 years ago
Updated 2 weeks ago
llama-hub
by
run-llama
0.0%
3k
Data loaders for LLMs (deprecated, now in LlamaIndex core)
Starred by
+3
Created 2 years ago
Updated 1 year ago
Instruction-Tuning-Papers
by
SinclairCoder
0%
769
Reading list for instruction tuning papers
Starred by
Created 2 years ago
Updated 2 years ago
parallelformers
by
tunib-ai
0%
790
Toolkit for easy model parallelization
Starred by
+1
Created 4 years ago
Updated 2 years ago
alpa
by
alpa-projects
0.0%
3k
Auto-parallelization framework for large-scale neural network training and serving
Starred by
+16
Created 4 years ago
Updated 1 year ago
GLM-130B
by
zai-org
0.0%
8k
Bilingual model for research and evaluation
Starred by
+6
Created 3 years ago
Updated 2 years ago
FasterTransformer
by
NVIDIA
0.1%
6k
Optimized transformer library for inference
Starred by
+11
Created 4 years ago
Updated 1 year ago
orama
by
oramasearch
0.1%
10k
Browser-based search engine and RAG pipeline
Starred by
+2
Created 3 years ago
Updated 1 day ago
bitsandbytes
by
bitsandbytes-foundation
0.3%
8k
PyTorch library for k-bit quantization, enabling accessible LLMs
Starred by
+24
Created 4 years ago
Updated 4 days ago
examples
by
mosaicml
0%
463
Reference benchmarks for training and deploying ML models at scale
Starred by
Created 3 years ago
Updated 2 months ago
tevatron
by
texttron
0.4%
691
Unified toolkit for document retrieval across modalities, languages, and scale
Starred by
Created 4 years ago
Updated 2 weeks ago
trlx
by
CarperAI
0.3%
5k
Distributed RLHF for LLMs
Starred by
+16
Created 2 years ago
Updated 1 year ago
tiktoken
by
openai
0.4%
16k
Fast BPE tokenizer for OpenAI models
Starred by
+26
Created 2 years ago
Updated 1 day ago
whisper
by
openai
0.4%
87k
Speech recognition model for multilingual transcription/translation
Starred by
+38
Created 2 years ago
Updated 1 week ago
speechbrain
by
speechbrain
0.3%
10k
PyTorch toolkit for speech and text processing research
Starred by
+4
Created 5 years ago
Updated 2 weeks ago
galai
by
paperswithcode
0.0%
3k
Scientific language model API
Starred by
+5
Created 2 years ago
Updated 2 years ago
faiss
by
facebookresearch
0.3%
37k
Similarity search library for dense vectors
Starred by
+51
Created 8 years ago
Updated 1 day ago
tinygrad
by
tinygrad
0.2%
30k
Minimalist deep learning framework for education and exploration
Starred by
+26
Created 4 years ago
Updated 1 day ago
pytorch-lightning
by
Lightning-AI
0.1%
30k
Deep learning framework for pretraining, finetuning, and deploying AI models
Starred by
+28
Created 6 years ago
Updated 22 hours ago
RL4LMs
by
allenai
0.3%
2k
RL library to fine-tune language models to human preferences
Starred by
+3
Created 3 years ago
Updated 1 year ago
t-few
by
r-three
0%
456
Code for parameter-efficient fine-tuning research paper
Created 3 years ago
Updated 2 years ago
manifest
by
HazyResearch
0%
444
SDK for prompt programming with foundation models
Starred by
+2
Created 3 years ago
Updated 1 year ago
lm-evaluation-harness
by
EleutherAI
0.6%
10k
Framework for few-shot language model evaluation
Starred by
+16
Created 5 years ago
Updated 2 days ago
s2orc
by
allenai
0.1%
964
Corpus for NLP/text mining research on scientific papers
Starred by
Created 5 years ago
Updated 1 year ago
stable-diffusion
by
CompVis
0.1%
71k
Latent text-to-image diffusion model
Starred by
+53
Created 3 years ago
Updated 1 year ago
primeqa
by
primeqa
0%
736
Open-source repo for multilingual question answering research
Starred by
+3
Created 3 years ago
Updated 7 months ago
flax
by
google
0.1%
7k
NN library for JAX, designed for flexibility in neural network research
Starred by
+17
Created 5 years ago
Updated 1 day ago
optimum
by
huggingface
0.7%
3k
Hardware optimization tools for Transformers, Diffusers, etc
Starred by
+10
Created 4 years ago
Updated 4 days ago
sentence-transformers
by
UKPLab
0.3%
17k
Framework for text embeddings, retrieval, and reranking
Starred by
+20
Created 6 years ago
Updated 2 days ago
annotated_deep_learning_paper_implementations
by
labmlai
0.3%
63k
PyTorch implementations/tutorials of deep learning papers with side-by-side notes
Starred by
+4
Created 5 years ago
Updated 1 week ago
datasets
by
huggingface
0.2%
21k
Access and process large AI datasets efficiently
Starred by
+23
Created 5 years ago
Updated 3 days ago
lightning-transformers
by
Lightning-Universe
0%
610
Archived library for training Transformers with PyTorch Lightning
Starred by
Created 4 years ago
Updated 2 years ago
netron
by
lutzroeder
0.2%
31k
Model visualizer for neural networks, deep learning, and ML
Starred by
+23
Created 14 years ago
Updated 1 day ago
unilm
by
microsoft
0.1%
22k
Foundation models for language, vision, speech, and multimodal tasks
Starred by
+19
Created 6 years ago
Updated 1 month ago
DeBERTa
by
microsoft
0.1%
2k
BERT enhancement via disentangled attention, enhanced mask decoder
Starred by
+1
Created 5 years ago
Updated 1 year ago
nlp-recipes
by
microsoft
0.0%
6k
NLP examples and best practices as Jupyter notebooks
Starred by
Created 6 years ago
Updated 3 years ago
fairseq
by
facebookresearch
0.1%
32k
Sequence modeling toolkit for translation, language modeling, and text generation research
Starred by
+40
Created 8 years ago
Updated 2 months ago
tokenizers
by
huggingface
0.2%
10k
Fast tokenizer library optimized for research and production
Starred by
+21
Created 5 years ago
Updated 21 hours ago
transformers
by
huggingface
0.2%
149k
ML library for pretrained model inference and training
Starred by
+92
Created 6 years ago
Updated 22 hours ago
BlingFire
by
microsoft
0.1%
2k
Fast text tokenization library
Starred by
+1
Created 6 years ago
Updated 8 months ago
anserini
by
castorini
0%
1k
Lucene toolkit for reproducible information retrieval research
Starred by
Created 10 years ago
Updated 1 day ago
awesome-information-retrieval
by
harpribot
0%
1k
Curated list of information retrieval resources
Starred by
Created 8 years ago
Updated 2 years ago
bert
by
google-research
0.1%
39k
TensorFlow code and pre-trained models for BERT
Starred by
+24
Created 6 years ago
Updated 1 year ago
tsv-utils
by
eBay
0.1%
1k
CLI tools for large tabular data files: filtering, statistics, sampling, joins, and more
Starred by
Created 9 years ago
Updated 3 years ago
tensorflow
by
tensorflow
0.1%
191k
Open-source ML framework
Starred by
+91
Created 9 years ago
Updated 21 hours ago
spaCy
by
explosion
0.5%
32k
NLP library for production applications
Starred by
+38
Created 11 years ago
Updated 3 months ago
Feedback? Help us improve.