Home
Browse all repos
/
Discover and explore top open-source AI tools and projects—updated daily.
Home
Browse all repos
Home
>
Users
>
Luca Soldaini
Luca Soldaini
Research Scientist at Ai2
GitHub
X
Starred Projects (99)
verifiers
by
PrimeIntellect-ai
1.4%
3k
RL for LLMs in verifiable environments
Starred by
+11
Created 8 months ago
Updated 2 days ago
ProX
by
GAIR-NLP
0%
263
Data refinement framework for improving pre-training data quality
Created 1 year ago
Updated 3 months ago
system-prompts-and-models-of-ai-tools
by
x1xhlol
1.2%
91k
AI tool system prompts and models
Starred by
+9
Created 7 months ago
Updated 3 days ago
cua
by
trycua
7.2%
11k
AI agent framework for computer OS control in virtual containers
Starred by
+6
Created 8 months ago
Updated 2 days ago
OLMo-core
by
allenai
0%
305
PyTorch building blocks for large language model training and inference
Starred by
Created 1 year ago
Updated 1 day ago
llm-datasets
by
mlabonne
0.4%
4k
Curated datasets/tools for LLM post-training
Starred by
+1
Created 1 year ago
Updated 2 months ago
webdataset
by
webdataset
0.2%
3k
High-performance I/O system for large deep learning problems, strong PyTorch support
Starred by
+13
Created 6 years ago
Updated 3 months ago
olmocr
by
allenai
0.2%
14k
Toolkit for linearizing PDFs for LLM datasets/training
Starred by
Created 1 year ago
Updated 1 day ago
LLM.swift
by
eastriverlee
2.1%
756
Swift SDK for local LLM interaction on Apple platforms
Starred by
Created 1 year ago
Updated 1 week ago
SpeziLLM
by
StanfordSpezi
0.7%
269
LLM integration for Swift applications
Starred by
Created 2 years ago
Updated 2 weeks ago
awesome-model-based-RL
by
opendilab
0.2%
1k
Curated list of model-based RL resources
Created 3 years ago
Updated 1 month ago
semchunk
by
isaacus-dev
2.1%
384
Python library for splitting text into semantically meaningful chunks
Created 1 year ago
Updated 2 months ago
torchtitan
by
pytorch
0.6%
5k
PyTorch platform for generative AI model training research
Starred by
+11
Created 1 year ago
Updated 21 hours ago
chat_templates
by
chujiezheng
0.3%
704
Chat templates for HuggingFace LLMs
Starred by
Created 1 year ago
Updated 10 months ago
DataDreamer
by
datadreamer-dev
0.6%
1k
Python library for synthetic data generation and training workflows
Starred by
+1
Created 2 years ago
Updated 8 months ago
MAP-NEO
by
multimodal-art-projection
0.2%
964
Open-source LLM with pretraining data, pipeline, scripts, and alignment code
Starred by
Created 1 year ago
Updated 8 months ago
sglang
by
sgl-project
0.9%
19k
Fast serving framework for LLMs and vision language models
Starred by
+32
Created 1 year ago
Updated 20 hours ago
cosmopedia
by
huggingface
0.2%
543
Synthetic dataset for LLM training
Starred by
Created 1 year ago
Updated 10 months ago
distilabel
by
argilla-io
0.2%
3k
Framework for synthetic data and AI feedback pipelines
Starred by
+12
Created 2 years ago
Updated 1 day ago
wtpsplit
by
segment-any-text
0.3%
1k
Text segmentation toolkit for robust sentence splitting
Starred by
Created 5 years ago
Updated 6 days ago
outlines
by
dottxt-ai
0.3%
13k
SDK for structured LLM text generation
Starred by
+34
Created 2 years ago
Updated 4 days ago
domains
by
tb0hdan
0.1%
811
Internet domains dataset for battling phishing attacks and research
Created 5 years ago
Updated 2 weeks ago
marker
by
datalab-to
0.4%
29k
CLI tool for converting PDFs and other documents to Markdown, JSON, and HTML
Starred by
+14
Created 1 year ago
Updated 2 days ago
OLMo-Eval
by
allenai
0%
364
Evaluation suite for LLMs
Created 2 years ago
Updated 3 months ago
datatrove
by
huggingface
0.3%
3k
Data processing library for large-scale text data
Starred by
+9
Created 2 years ago
Updated 6 days ago
reward-bench
by
allenai
0%
639
Reward model evaluation tool
Starred by
Created 1 year ago
Updated 4 months ago
mlx
by
ml-explore
0.2%
22k
Array framework for machine learning on Apple silicon
Starred by
+36
Created 1 year ago
Updated 1 day ago
InternLM
by
InternLM
0.1%
7k
LLM series (InternLM, InternLM2, InternLM2.5, InternLM3) official release
Starred by
+4
Created 2 years ago
Updated 2 months ago
gpt_academic
by
binary-husky
0.1%
69k
LLM tool for paper reading/polishing/writing, optimized UI
Starred by
+2
Created 2 years ago
Updated 3 weeks ago
gpt_paper_assistant
by
tatsu-lab
0%
535
ArXiv scanner using GPT-4 for personalized paper recommendations
Starred by
Created 1 year ago
Updated 1 year ago
dolma
by
allenai
0.3%
1k
Toolkit for curating datasets for language model pre-training
Starred by
+1
Created 2 years ago
Updated 2 weeks ago
MiniChain
by
srush
0%
1k
Tiny library for coding with large language models
Starred by
+7
Created 2 years ago
Updated 1 year ago
falcontune
by
rmihaylov
0%
464
CLI tool for finetuning Falcon LLMs
Starred by
Created 2 years ago
Updated 2 years ago
NeMo
by
NVIDIA-NeMo
0.3%
16k
Scalable generative AI framework for LLMs, multimodal, and speech AI research
Starred by
+15
Created 6 years ago
Updated 1 day ago
guidance
by
guidance-ai
0.1%
21k
Guidance is a programming paradigm for steering LLMs
Starred by
+38
Created 2 years ago
Updated 6 days ago
hh-rlhf
by
anthropics
0.2%
2k
RLHF dataset for training safe AI assistants
Starred by
+4
Created 3 years ago
Updated 3 months ago
self-instruct
by
yizhongw
0.2%
4k
Self-Instruct: Research paper for aligning language models with self-generated instructions
Starred by
+3
Created 2 years ago
Updated 2 years ago
gpt4all
by
nomic-ai
0.1%
77k
Desktop app for local LLM inference, no GPU/API needed
Starred by
+29
Created 2 years ago
Updated 4 months ago
garak
by
NVIDIA
1.1%
6k
LLM vulnerability scanner for red-teaming and security assessments
Starred by
+4
Created 2 years ago
Updated 1 day ago
awesome-instruction-learning
by
RenzeLou
0.2%
500
Curated list of instruction tuning/following papers and datasets
Starred by
Created 2 years ago
Updated 1 year ago
docquery
by
impira
0.1%
2k
Document query engine for extracting information from documents
Starred by
Created 3 years ago
Updated 2 years ago
pyllms
by
kagisearch
0.3%
797
Python SDK for LLM access and benchmarking
Starred by
+2
Created 2 years ago
Updated 2 months ago
dspy
by
stanfordnlp
0.8%
29k
Framework for programming language models, not prompting
Starred by
+49
Created 2 years ago
Updated 3 days ago
OLMo
by
allenai
0.1%
6k
Open language model code for training, evaluation, and inference
Starred by
+4
Created 2 years ago
Updated 1 month ago
instruction-datasets
by
raunak-agarwal
0%
257
Dataset list for instruction tuning of LLMs
Starred by
Created 2 years ago
Updated 1 year ago
GPTQ-for-LLaMa
by
qwopqwop200
0.1%
3k
4-bit quantization for LLaMA models using GPTQ
Starred by
+2
Created 2 years ago
Updated 1 year ago
openai-cookbook
by
openai
0.3%
68k
Examples for using the OpenAI API
Starred by
+22
Created 3 years ago
Updated 3 days ago
transformers-bloom-inference
by
huggingface
0%
565
Inference solutions for BLOOM models
Starred by
Created 3 years ago
Updated 1 year ago
llama
by
meta-llama
0.1%
59k
Inference code for Llama 2 models (deprecated)
Starred by
+38
Created 2 years ago
Updated 8 months ago
composer
by
mosaicml
0.1%
5k
DL framework for training at scale, optimized for large-scale clusters
Starred by
+17
Created 4 years ago
Updated 1 week ago
llama-hub
by
run-llama
0%
3k
Data loaders for LLMs (deprecated, now in LlamaIndex core)
Starred by
+4
Created 2 years ago
Updated 1 year ago
Instruction-Tuning-Papers
by
SinclairCoder
0%
770
Reading list for instruction tuning papers
Starred by
Created 2 years ago
Updated 2 years ago
parallelformers
by
tunib-ai
0%
790
Toolkit for easy model parallelization
Starred by
+1
Created 4 years ago
Updated 2 years ago
alpa
by
alpa-projects
0.0%
3k
Auto-parallelization framework for large-scale neural network training and serving
Starred by
+17
Created 4 years ago
Updated 1 year ago
GLM-130B
by
zai-org
0.0%
8k
Bilingual model for research and evaluation
Starred by
+6
Created 3 years ago
Updated 2 years ago
FasterTransformer
by
NVIDIA
0.1%
6k
Optimized transformer library for inference
Starred by
+12
Created 4 years ago
Updated 1 year ago
orama
by
oramasearch
0.2%
10k
Browser-based search engine and RAG pipeline
Starred by
+2
Created 3 years ago
Updated 2 days ago
bitsandbytes
by
bitsandbytes-foundation
0.3%
8k
PyTorch library for k-bit quantization, enabling accessible LLMs
Starred by
+26
Created 4 years ago
Updated 1 week ago
examples
by
mosaicml
0.2%
462
Reference benchmarks for training and deploying ML models at scale
Starred by
Created 3 years ago
Updated 3 months ago
metaseq
by
facebookresearch
0.0%
7k
Codebase for large-scale transformer model development and deployment
Starred by
+11
Created 3 years ago
Updated 1 year ago
tevatron
by
texttron
0.1%
699
Unified toolkit for document retrieval across modalities, languages, and scale
Starred by
Created 4 years ago
Updated 1 week ago
trlx
by
CarperAI
0.0%
5k
Distributed RLHF for LLMs
Starred by
+16
Created 3 years ago
Updated 1 year ago
tiktoken
by
openai
0.5%
16k
Fast BPE tokenizer for OpenAI models
Starred by
+27
Created 2 years ago
Updated 1 week ago
whisper
by
openai
0.4%
89k
Speech recognition model for multilingual transcription/translation
Starred by
+40
Created 3 years ago
Updated 1 month ago
speechbrain
by
speechbrain
0.3%
11k
PyTorch toolkit for speech and text processing research
Starred by
+5
Created 5 years ago
Updated 5 days ago
galai
by
paperswithcode
0%
3k
Scientific language model API
Starred by
+5
Created 2 years ago
Updated 2 years ago
faiss
by
facebookresearch
0.2%
37k
Similarity search library for dense vectors
Starred by
+52
Created 8 years ago
Updated 3 days ago
tinygrad
by
tinygrad
0.2%
30k
Minimalist deep learning framework for education and exploration
Starred by
+28
Created 5 years ago
Updated 20 hours ago
pytorch-lightning
by
Lightning-AI
0.1%
30k
Deep learning framework for pretraining, finetuning, and deploying AI models
Starred by
+31
Created 6 years ago
Updated 22 hours ago
RL4LMs
by
allenai
0.3%
2k
RL library to fine-tune language models to human preferences
Starred by
+3
Created 3 years ago
Updated 1 year ago
t-few
by
r-three
0%
457
Code for parameter-efficient fine-tuning research paper
Created 3 years ago
Updated 2 years ago
manifest
by
HazyResearch
0%
443
SDK for prompt programming with foundation models
Starred by
+2
Created 3 years ago
Updated 1 year ago
AITemplate
by
facebookincubator
0.1%
5k
Generate high-performance inference engines
Starred by
+19
Created 3 years ago
Updated 3 weeks ago
lm-evaluation-harness
by
EleutherAI
0.5%
10k
Framework for few-shot language model evaluation
Starred by
+18
Created 5 years ago
Updated 3 days ago
s2orc
by
allenai
0.4%
976
Corpus for NLP/text mining research on scientific papers
Starred by
Created 6 years ago
Updated 1 year ago
stable-diffusion
by
CompVis
0.1%
72k
Latent text-to-image diffusion model
Starred by
+54
Created 3 years ago
Updated 1 year ago
primeqa
by
primeqa
0.1%
739
Open-source repo for multilingual question answering research
Starred by
+3
Created 3 years ago
Updated 3 weeks ago
flax
by
google
0.1%
7k
NN library for JAX, designed for flexibility in neural network research
Starred by
+19
Created 5 years ago
Updated 1 day ago
optimum
by
huggingface
0.1%
3k
Hardware optimization tools for Transformers, Diffusers, etc
Starred by
+10
Created 4 years ago
Updated 5 days ago
sentence-transformers
by
UKPLab
0.2%
18k
Framework for text embeddings, retrieval, and reranking
Starred by
+21
Created 6 years ago
Updated 5 days ago
annotated_deep_learning_paper_implementations
by
labmlai
0.1%
63k
PyTorch implementations/tutorials of deep learning papers with side-by-side notes
Starred by
+4
Created 5 years ago
Updated 3 weeks ago
datasets
by
huggingface
0.1%
21k
Access and process large AI datasets efficiently
Starred by
+23
Created 5 years ago
Updated 1 day ago
lightning-transformers
by
Lightning-Universe
0%
612
Archived library for training Transformers with PyTorch Lightning
Starred by
Created 4 years ago
Updated 2 years ago
netron
by
lutzroeder
0.1%
32k
Model visualizer for neural networks, deep learning, and ML
Starred by
+23
Created 15 years ago
Updated 1 day ago
unilm
by
microsoft
0.1%
22k
Foundation models for language, vision, speech, and multimodal tasks
Starred by
+19
Created 6 years ago
Updated 3 months ago
text-to-text-transfer-transformer
by
google-research
0.1%
6k
Unified text-to-text transformer for NLP research
Starred by
+13
Created 6 years ago
Updated 5 months ago
DeBERTa
by
microsoft
0.1%
2k
BERT enhancement via disentangled attention, enhanced mask decoder
Starred by
+1
Created 5 years ago
Updated 2 years ago
nlp-recipes
by
microsoft
0%
6k
NLP examples and best practices as Jupyter notebooks
Starred by
Created 6 years ago
Updated 3 years ago
oie-resources
by
gkiril
0%
498
Extensive resources for Open Information Extraction (OIE) research
Created 6 years ago
Updated 3 years ago
fairseq
by
facebookresearch
0.0%
32k
Sequence modeling toolkit for translation, language modeling, and text generation research
Starred by
+42
Created 8 years ago
Updated 2 weeks ago
tokenizers
by
huggingface
0.2%
10k
Fast tokenizer library optimized for research and production
Starred by
+22
Created 6 years ago
Updated 6 days ago
transformers
by
huggingface
0.2%
151k
ML library for pretrained model inference and training
Starred by
+96
Created 7 years ago
Updated 20 hours ago
BlingFire
by
microsoft
0.1%
2k
Fast text tokenization library
Starred by
+1
Created 6 years ago
Updated 10 months ago
anserini
by
castorini
0.3%
1k
Lucene toolkit for reproducible information retrieval research
Starred by
Created 10 years ago
Updated 1 day ago
awesome-information-retrieval
by
harpribot
0.1%
1k
Curated list of information retrieval resources
Starred by
Created 9 years ago
Updated 2 years ago
bert
by
google-research
0.1%
40k
TensorFlow code and pre-trained models for BERT
Starred by
+26
Created 7 years ago
Updated 1 year ago
tsv-utils
by
eBay
0%
1k
CLI tools for large tabular data files: filtering, statistics, sampling, joins, and more
Starred by
Created 9 years ago
Updated 3 years ago
tensorflow
by
tensorflow
0.1%
192k
Open-source ML framework
Starred by
+97
Created 10 years ago
Updated 20 hours ago
spaCy
by
explosion
0.2%
33k
NLP library for production applications
Starred by
+40
Created 11 years ago
Updated 4 months ago
Feedback? Help us improve.