Home
Browse all repos
/
Discover and explore top open-source AI tools and projects—updated daily.
Home
Browse all repos
Home
>
Users
>
Luca Soldaini
Luca Soldaini
Research Scientist at Ai2
GitHub
X
Starred Projects (100)
verifiers
by
PrimeIntellect-ai
1.5%
4k
RL for LLMs in verifiable environments
Starred by
+11
Created 10 months ago
Updated 1 day ago
ProX
by
GAIR-NLP
0%
263
Data refinement framework for improving pre-training data quality
Created 1 year ago
Updated 4 months ago
system-prompts-and-models-of-ai-tools
by
x1xhlol
1.5%
98k
AI tool system prompts and models
Starred by
+9
Created 9 months ago
Updated 1 day ago
cua
by
trycua
0.6%
11k
AI agent framework for computer OS control in virtual containers
Starred by
+7
Created 10 months ago
Updated 1 day ago
OLMo-core
by
allenai
20.7%
468
PyTorch building blocks for large language model training and inference
Starred by
Created 1 year ago
Updated 22 hours ago
llm-datasets
by
mlabonne
0.7%
4k
Curated datasets/tools for LLM post-training
Starred by
+1
Created 1 year ago
Updated 2 weeks ago
webdataset
by
webdataset
0.3%
3k
High-performance I/O system for large deep learning problems, strong PyTorch support
Starred by
+13
Created 6 years ago
Updated 5 months ago
olmocr
by
allenai
0.4%
16k
Toolkit for linearizing PDFs for LLM datasets/training
Starred by
Created 1 year ago
Updated 5 days ago
LLM.swift
by
eastriverlee
0.4%
779
Swift SDK for local LLM interaction on Apple platforms
Starred by
Created 2 years ago
Updated 1 month ago
SpeziLLM
by
StanfordSpezi
0%
276
LLM integration for Swift applications
Starred by
Created 2 years ago
Updated 2 days ago
awesome-model-based-RL
by
opendilab
0.2%
1k
Curated list of model-based RL resources
Created 3 years ago
Updated 2 months ago
semchunk
by
isaacus-dev
0.7%
431
Python library for splitting text into semantically meaningful chunks
Created 2 years ago
Updated 1 month ago
torchtitan
by
pytorch
0.5%
5k
PyTorch platform for generative AI model training research
Starred by
+11
Created 1 year ago
Updated 1 day ago
chat_templates
by
chujiezheng
0.1%
706
Chat templates for HuggingFace LLMs
Starred by
Created 2 years ago
Updated 11 months ago
DataDreamer
by
datadreamer-dev
0%
1k
Python library for synthetic data generation and training workflows
Starred by
+1
Created 2 years ago
Updated 10 months ago
MAP-NEO
by
multimodal-art-projection
0%
968
Open-source LLM with pretraining data, pipeline, scripts, and alignment code
Starred by
Created 1 year ago
Updated 9 months ago
sglang
by
sgl-project
0.9%
20k
Fast serving framework for LLMs and vision language models
Starred by
+34
Created 1 year ago
Updated 22 hours ago
cosmopedia
by
huggingface
0%
555
Synthetic dataset for LLM training
Starred by
Created 1 year ago
Updated 1 year ago
distilabel
by
argilla-io
0.6%
3k
Framework for synthetic data and AI feedback pipelines
Starred by
+12
Created 2 years ago
Updated 6 days ago
wtpsplit
by
segment-any-text
0.3%
1k
Text segmentation toolkit for robust sentence splitting
Starred by
Created 5 years ago
Updated 1 week ago
outlines
by
dottxt-ai
0.6%
13k
SDK for structured LLM text generation
Starred by
+34
Created 2 years ago
Updated 2 days ago
domains
by
tb0hdan
0.1%
821
Internet domains dataset for battling phishing attacks and research
Created 5 years ago
Updated 2 months ago
marker
by
datalab-to
0.4%
30k
CLI tool for converting PDFs and other documents to Markdown, JSON, and HTML
Starred by
+14
Created 2 years ago
Updated 1 week ago
OLMo-Eval
by
allenai
0.8%
370
Evaluation suite for LLMs
Created 2 years ago
Updated 4 months ago
datatrove
by
huggingface
0.2%
3k
Data processing library for large-scale text data
Starred by
+9
Created 2 years ago
Updated 5 days ago
reward-bench
by
allenai
0.6%
661
Reward model evaluation tool
Starred by
Created 1 year ago
Updated 5 months ago
mlx
by
ml-explore
0.3%
23k
Array framework for machine learning on Apple silicon
Starred by
+36
Created 2 years ago
Updated 4 days ago
InternLM
by
InternLM
0.0%
7k
LLM series (InternLM, InternLM2, InternLM2.5, InternLM3) official release
Starred by
+4
Created 2 years ago
Updated 1 month ago
gpt_academic
by
binary-husky
0.1%
70k
LLM tool for paper reading/polishing/writing, optimized UI
Starred by
+2
Created 2 years ago
Updated 2 months ago
gpt_paper_assistant
by
tatsu-lab
0.2%
541
ArXiv scanner using GPT-4 for personalized paper recommendations
Starred by
Created 2 years ago
Updated 1 year ago
dolma
by
allenai
0.4%
1k
Toolkit for curating datasets for language model pre-training
Starred by
+1
Created 2 years ago
Updated 3 weeks ago
MiniChain
by
srush
0%
1k
Tiny library for coding with large language models
Starred by
+7
Created 2 years ago
Updated 1 year ago
falcontune
by
rmihaylov
0%
465
CLI tool for finetuning Falcon LLMs
Starred by
Created 2 years ago
Updated 2 years ago
NeMo
by
NVIDIA-NeMo
0.3%
16k
Scalable generative AI framework for LLMs, multimodal, and speech AI research
Starred by
+15
Created 6 years ago
Updated 2 days ago
guidance
by
guidance-ai
0.1%
21k
Guidance is a programming paradigm for steering LLMs
Starred by
+38
Created 3 years ago
Updated 1 week ago
hh-rlhf
by
anthropics
0.1%
2k
RLHF dataset for training safe AI assistants
Starred by
+4
Created 3 years ago
Updated 5 months ago
self-instruct
by
yizhongw
0.1%
5k
Self-Instruct: Research paper for aligning language models with self-generated instructions
Starred by
+3
Created 2 years ago
Updated 2 years ago
gpt4all
by
nomic-ai
0.0%
77k
Desktop app for local LLM inference, no GPU/API needed
Starred by
+29
Created 2 years ago
Updated 6 months ago
garak
by
NVIDIA
0.6%
6k
LLM vulnerability scanner for red-teaming and security assessments
Starred by
+4
Created 2 years ago
Updated 4 days ago
awesome-instruction-learning
by
RenzeLou
0.2%
504
Curated list of instruction tuning/following papers and datasets
Starred by
Created 2 years ago
Updated 1 year ago
docquery
by
impira
0%
2k
Document query engine for extracting information from documents
Starred by
Created 3 years ago
Updated 2 years ago
pyllms
by
kagisearch
0.3%
806
Python SDK for LLM access and benchmarking
Starred by
+2
Created 2 years ago
Updated 3 months ago
dspy
by
stanfordnlp
0.5%
30k
Framework for programming language models, not prompting
Starred by
+49
Created 2 years ago
Updated 4 days ago
OLMo
by
allenai
0.6%
6k
Open language model code for training, evaluation, and inference
Starred by
+4
Created 2 years ago
Updated 6 days ago
instruction-datasets
by
raunak-agarwal
0.4%
259
Dataset list for instruction tuning of LLMs
Starred by
Created 2 years ago
Updated 2 years ago
GPTQ-for-LLaMa
by
qwopqwop200
0.0%
3k
4-bit quantization for LLaMA models using GPTQ
Starred by
+2
Created 2 years ago
Updated 1 year ago
openai-cookbook
by
openai
0.2%
69k
Examples for using the OpenAI API
Starred by
+22
Created 3 years ago
Updated 4 days ago
transformers-bloom-inference
by
huggingface
0%
564
Inference solutions for BLOOM models
Starred by
Created 3 years ago
Updated 1 year ago
llama
by
meta-llama
0.0%
59k
Inference code for Llama 2 models (deprecated)
Starred by
+38
Created 2 years ago
Updated 10 months ago
composer
by
mosaicml
0.1%
5k
DL framework for training at scale, optimized for large-scale clusters
Starred by
+17
Created 4 years ago
Updated 2 weeks ago
llama-hub
by
run-llama
0%
3k
Data loaders for LLMs (deprecated, now in LlamaIndex core)
Starred by
+4
Created 2 years ago
Updated 1 year ago
Instruction-Tuning-Papers
by
SinclairCoder
0%
768
Reading list for instruction tuning papers
Starred by
Created 3 years ago
Updated 2 years ago
parallelformers
by
tunib-ai
0%
791
Toolkit for easy model parallelization
Starred by
+1
Created 4 years ago
Updated 2 years ago
alpa
by
alpa-projects
0.1%
3k
Auto-parallelization framework for large-scale neural network training and serving
Starred by
+17
Created 4 years ago
Updated 2 years ago
GLM-130B
by
zai-org
0.0%
8k
Bilingual model for research and evaluation
Starred by
+6
Created 3 years ago
Updated 2 years ago
FasterTransformer
by
NVIDIA
0.1%
6k
Optimized transformer library for inference
Starred by
+12
Created 4 years ago
Updated 1 year ago
orama
by
oramasearch
0.2%
10k
Browser-based search engine and RAG pipeline
Starred by
+2
Created 3 years ago
Updated 1 week ago
bitsandbytes
by
bitsandbytes-foundation
0.3%
8k
PyTorch library for k-bit quantization, enabling accessible LLMs
Starred by
+26
Created 4 years ago
Updated 4 days ago
examples
by
mosaicml
0%
463
Reference benchmarks for training and deploying ML models at scale
Starred by
Created 3 years ago
Updated 5 months ago
metaseq
by
facebookresearch
0%
7k
Codebase for large-scale transformer model development and deployment
Starred by
+11
Created 3 years ago
Updated 1 year ago
tevatron
by
texttron
0.3%
708
Unified toolkit for document retrieval across modalities, languages, and scale
Starred by
Created 4 years ago
Updated 1 month ago
trlx
by
CarperAI
0%
5k
Distributed RLHF for LLMs
Starred by
+16
Created 3 years ago
Updated 1 year ago
tiktoken
by
openai
0.3%
17k
Fast BPE tokenizer for OpenAI models
Starred by
+28
Created 3 years ago
Updated 1 month ago
whisper
by
openai
0.3%
91k
Speech recognition model for multilingual transcription/translation
Starred by
+40
Created 3 years ago
Updated 2 months ago
speechbrain
by
speechbrain
0.3%
11k
PyTorch toolkit for speech and text processing research
Starred by
+5
Created 5 years ago
Updated 1 day ago
galai
by
paperswithcode
0.1%
3k
Scientific language model API
Starred by
+5
Created 3 years ago
Updated 2 years ago
faiss
by
facebookresearch
0.3%
38k
Similarity search library for dense vectors
Starred by
+52
Created 8 years ago
Updated 6 days ago
tinygrad
by
tinygrad
0.2%
31k
Minimalist deep learning framework for education and exploration
Starred by
+29
Created 5 years ago
Updated 1 day ago
pytorch-lightning
by
Lightning-AI
0.1%
31k
Deep learning framework for pretraining, finetuning, and deploying AI models
Starred by
+31
Created 6 years ago
Updated 2 days ago
RL4LMs
by
allenai
0.1%
2k
RL library to fine-tune language models to human preferences
Starred by
+3
Created 3 years ago
Updated 1 year ago
t-few
by
r-three
0%
456
Code for parameter-efficient fine-tuning research paper
Created 3 years ago
Updated 2 years ago
manifest
by
HazyResearch
0%
444
SDK for prompt programming with foundation models
Starred by
+2
Created 3 years ago
Updated 1 year ago
AITemplate
by
facebookincubator
0.0%
5k
Generate high-performance inference engines
Starred by
+19
Created 3 years ago
Updated 1 month ago
lm-evaluation-harness
by
EleutherAI
0.6%
11k
Framework for few-shot language model evaluation
Starred by
+18
Created 5 years ago
Updated 3 days ago
s2orc
by
allenai
0.1%
985
Corpus for NLP/text mining research on scientific papers
Starred by
Created 6 years ago
Updated 1 year ago
stable-diffusion
by
CompVis
0.1%
72k
Latent text-to-image diffusion model
Starred by
+54
Created 3 years ago
Updated 1 year ago
primeqa
by
primeqa
0%
740
Open-source repo for multilingual question answering research
Starred by
+3
Created 3 years ago
Updated 2 months ago
flax
by
google
0.2%
7k
NN library for JAX, designed for flexibility in neural network research
Starred by
+19
Created 5 years ago
Updated 3 days ago
optimum
by
huggingface
0.3%
3k
Hardware optimization tools for Transformers, Diffusers, etc
Starred by
+10
Created 4 years ago
Updated 2 weeks ago
sentence-transformers
by
huggingface
0.2%
18k
Framework for text embeddings, retrieval, and reranking
Starred by
+22
Created 6 years ago
Updated 1 week ago
annotated_deep_learning_paper_implementations
by
labmlai
0.2%
65k
PyTorch implementations/tutorials of deep learning papers with side-by-side notes
Starred by
+4
Created 5 years ago
Updated 2 weeks ago
datasets
by
huggingface
0.1%
21k
Access and process large AI datasets efficiently
Starred by
+23
Created 5 years ago
Updated 3 days ago
lightning-transformers
by
Lightning-Universe
0.2%
613
Archived library for training Transformers with PyTorch Lightning
Starred by
Created 5 years ago
Updated 3 years ago
netron
by
lutzroeder
0.2%
32k
Model visualizer for neural networks, deep learning, and ML
Starred by
+23
Created 15 years ago
Updated 1 day ago
unilm
by
microsoft
0.0%
22k
Foundation models for language, vision, speech, and multimodal tasks
Starred by
+19
Created 6 years ago
Updated 5 months ago
pyterrier
by
terrier-org
0.4%
486
Python framework for information retrieval and RAG
Created 5 years ago
Updated 2 days ago
text-to-text-transfer-transformer
by
google-research
0.1%
6k
Unified text-to-text transformer for NLP research
Starred by
+13
Created 6 years ago
Updated 3 weeks ago
DeBERTa
by
microsoft
0.3%
2k
BERT enhancement via disentangled attention, enhanced mask decoder
Starred by
+1
Created 5 years ago
Updated 2 years ago
nlp-recipes
by
microsoft
0.1%
6k
NLP examples and best practices as Jupyter notebooks
Starred by
Created 6 years ago
Updated 3 years ago
oie-resources
by
gkiril
0%
500
Extensive resources for Open Information Extraction (OIE) research
Created 7 years ago
Updated 3 years ago
fairseq
by
facebookresearch
0.1%
32k
Sequence modeling toolkit for translation, language modeling, and text generation research
Starred by
+42
Created 8 years ago
Updated 2 months ago
tokenizers
by
huggingface
0.2%
10k
Fast tokenizer library optimized for research and production
Starred by
+22
Created 6 years ago
Updated 2 days ago
transformers
by
huggingface
0.2%
153k
ML library for pretrained model inference and training
Starred by
+96
Created 7 years ago
Updated 1 day ago
BlingFire
by
microsoft
0.3%
2k
Fast text tokenization library
Starred by
+1
Created 6 years ago
Updated 11 months ago
anserini
by
castorini
0.2%
1k
Lucene toolkit for reproducible information retrieval research
Starred by
Created 10 years ago
Updated 1 day ago
awesome-information-retrieval
by
harpribot
0.2%
1k
Curated list of information retrieval resources
Starred by
Created 9 years ago
Updated 2 years ago
bert
by
google-research
0.1%
40k
TensorFlow code and pre-trained models for BERT
Starred by
+26
Created 7 years ago
Updated 1 year ago
tsv-utils
by
eBay
0%
1k
CLI tools for large tabular data files: filtering, statistics, sampling, joins, and more
Starred by
Created 9 years ago
Updated 3 years ago
tensorflow
by
tensorflow
0.1%
193k
Open-source ML framework
Starred by
+97
Created 10 years ago
Updated 1 day ago
spaCy
by
explosion
0.1%
33k
NLP library for production applications
Starred by
+40
Created 11 years ago
Updated 3 days ago
Feedback? Help us improve.