Home
Browse all repos
/
Discover and explore top open-source AI tools and projects—updated daily.
Home
Browse all repos
Home
>
Users
>
Luca Soldaini
Luca Soldaini
Research Scientist at Ai2
GitHub
X
Starred Projects (100)
verifiers
by
PrimeIntellect-ai
0.6%
4k
RL for LLMs in verifiable environments
Starred by
+11
Created 1 year ago
Updated 1 day ago
ProX
by
GAIR-NLP
0%
264
Data refinement framework for improving pre-training data quality
Created 1 year ago
Updated 6 months ago
system-prompts-and-models-of-ai-tools
by
x1xhlol
1.6%
111k
AI tool system prompts and models
Starred by
+10
Created 10 months ago
Updated 6 days ago
cua
by
trycua
1.4%
12k
AI agent framework for computer OS control in virtual containers
Starred by
+8
Created 1 year ago
Updated 2 days ago
OLMo-core
by
allenai
3.7%
742
PyTorch building blocks for large language model training and inference
Starred by
Created 1 year ago
Updated 1 day ago
llm-datasets
by
mlabonne
0.5%
4k
Curated datasets/tools for LLM post-training
Starred by
+1
Created 1 year ago
Updated 2 months ago
webdataset
by
webdataset
0.2%
3k
High-performance I/O system for large deep learning problems, strong PyTorch support
Starred by
+13
Created 6 years ago
Updated 7 months ago
olmocr
by
allenai
0.3%
17k
Toolkit for linearizing PDFs for LLM datasets/training
Starred by
+1
Created 1 year ago
Updated 2 days ago
LLM.swift
by
eastriverlee
0.2%
808
Swift SDK for local LLM interaction on Apple platforms
Starred by
Created 2 years ago
Updated 1 month ago
SpeziLLM
by
StanfordSpezi
0%
278
LLM integration for Swift applications
Starred by
Created 2 years ago
Updated 1 week ago
awesome-model-based-RL
by
opendilab
0.2%
1k
Curated list of model-based RL resources
Created 4 years ago
Updated 1 month ago
semchunk
by
isaacus-dev
0.9%
544
Python library for splitting text into semantically meaningful chunks
Created 2 years ago
Updated 3 months ago
torchtitan
by
pytorch
0.6%
5k
PyTorch platform for generative AI model training research
Starred by
+12
Created 2 years ago
Updated 1 day ago
chat_templates
by
chujiezheng
0%
713
Chat templates for HuggingFace LLMs
Starred by
Created 2 years ago
Updated 1 year ago
DataDreamer
by
datadreamer-dev
0%
1k
Python library for synthetic data generation and training workflows
Starred by
+1
Created 2 years ago
Updated 11 months ago
MAP-NEO
by
multimodal-art-projection
0.1%
976
Open-source LLM with pretraining data, pipeline, scripts, and alignment code
Starred by
Created 1 year ago
Updated 11 months ago
sglang
by
sgl-project
1.2%
23k
Fast serving framework for LLMs and vision language models
Starred by
+34
Created 2 years ago
Updated 1 day ago
cosmopedia
by
huggingface
0.2%
560
Synthetic dataset for LLM training
Starred by
Created 1 year ago
Updated 1 year ago
distilabel
by
argilla-io
0.4%
3k
Framework for synthetic data and AI feedback pipelines
Starred by
+12
Created 2 years ago
Updated 1 week ago
wtpsplit
by
segment-any-text
0.4%
1k
Text segmentation toolkit for robust sentence splitting
Starred by
Created 6 years ago
Updated 6 days ago
outlines
by
dottxt-ai
0.3%
13k
SDK for structured LLM text generation
Starred by
+34
Created 2 years ago
Updated 4 days ago
domains
by
tb0hdan
1.0%
1k
Internet domains dataset for battling phishing attacks and research
Created 6 years ago
Updated 5 days ago
marker
by
datalab-to
0.5%
31k
CLI tool for converting PDFs and other documents to Markdown, JSON, and HTML
Starred by
+14
Created 2 years ago
Updated 1 week ago
OLMo-Eval-Legacy
by
allenai
0%
378
Evaluation suite for LLMs
Created 2 years ago
Updated 6 months ago
datatrove
by
huggingface
0.5%
3k
Data processing library for large-scale text data
Starred by
+9
Created 2 years ago
Updated 4 days ago
reward-bench
by
allenai
0.1%
683
Reward model evaluation tool
Starred by
Created 2 years ago
Updated 1 week ago
mlx
by
ml-explore
0.4%
24k
Array framework for machine learning on Apple silicon
Starred by
+38
Created 2 years ago
Updated 1 day ago
InternLM
by
InternLM
0.1%
7k
LLM series (InternLM, InternLM2, InternLM2.5, InternLM3) official release
Starred by
+4
Created 2 years ago
Updated 2 months ago
gpt_academic
by
binary-husky
0.1%
70k
LLM tool for paper reading/polishing/writing, optimized UI
Starred by
+2
Created 2 years ago
Updated 2 days ago
gpt_paper_assistant
by
tatsu-lab
0%
541
ArXiv scanner using GPT-4 for personalized paper recommendations
Starred by
Created 2 years ago
Updated 1 year ago
dolma
by
allenai
0.3%
1k
Toolkit for curating datasets for language model pre-training
Starred by
+1
Created 2 years ago
Updated 2 months ago
MiniChain
by
srush
0%
1k
Tiny library for coding with large language models
Starred by
+7
Created 3 years ago
Updated 1 year ago
falcontune
by
rmihaylov
0%
463
CLI tool for finetuning Falcon LLMs
Starred by
Created 2 years ago
Updated 2 years ago
NeMo
by
NVIDIA-NeMo
0.2%
17k
Scalable generative AI framework for LLMs, multimodal, and speech AI research
Starred by
+15
Created 6 years ago
Updated 3 days ago
guidance
by
guidance-ai
0.1%
21k
Guidance is a programming paradigm for steering LLMs
Starred by
+38
Created 3 years ago
Updated 5 days ago
hh-rlhf
by
anthropics
0.1%
2k
RLHF dataset for training safe AI assistants
Starred by
+4
Created 3 years ago
Updated 7 months ago
self-instruct
by
yizhongw
0.0%
5k
Self-Instruct: Research paper for aligning language models with self-generated instructions
Starred by
+3
Created 3 years ago
Updated 2 years ago
gpt4all
by
nomic-ai
0.0%
77k
Desktop app for local LLM inference, no GPU/API needed
Starred by
+29
Created 2 years ago
Updated 8 months ago
garak
by
NVIDIA
0.6%
7k
LLM vulnerability scanner for red-teaming and security assessments
Starred by
+4
Created 2 years ago
Updated 4 days ago
awesome-instruction-learning
by
RenzeLou
0%
507
Curated list of instruction tuning/following papers and datasets
Starred by
Created 2 years ago
Updated 1 year ago
docquery
by
impira
0.1%
2k
Document query engine for extracting information from documents
Starred by
Created 3 years ago
Updated 2 years ago
pyllms
by
kagisearch
0.1%
816
Python SDK for LLM access and benchmarking
Starred by
+2
Created 2 years ago
Updated 4 weeks ago
dspy
by
stanfordnlp
0.5%
32k
Framework for programming language models, not prompting
Starred by
+49
Created 3 years ago
Updated 2 days ago
OLMo
by
allenai
0.1%
6k
Open language model code for training, evaluation, and inference
Starred by
+4
Created 2 years ago
Updated 2 months ago
instruction-datasets
by
raunak-agarwal
0%
260
Dataset list for instruction tuning of LLMs
Starred by
Created 2 years ago
Updated 2 years ago
GPTQ-for-LLaMa
by
qwopqwop200
0%
3k
4-bit quantization for LLaMA models using GPTQ
Starred by
+2
Created 2 years ago
Updated 1 year ago
openai-cookbook
by
openai
0.2%
71k
Examples for using the OpenAI API
Starred by
+22
Created 3 years ago
Updated 1 day ago
transformers-bloom-inference
by
huggingface
0%
566
Inference solutions for BLOOM models
Starred by
Created 3 years ago
Updated 1 year ago
llama
by
meta-llama
0.0%
59k
Inference code for Llama 2 models (deprecated)
Starred by
+38
Created 2 years ago
Updated 1 year ago
composer
by
mosaicml
0.0%
5k
DL framework for training at scale, optimized for large-scale clusters
Starred by
+17
Created 4 years ago
Updated 2 months ago
llama-hub
by
run-llama
0.0%
3k
Data loaders for LLMs (deprecated, now in LlamaIndex core)
Starred by
+4
Created 3 years ago
Updated 1 year ago
Instruction-Tuning-Papers
by
SinclairCoder
0%
767
Reading list for instruction tuning papers
Starred by
Created 3 years ago
Updated 2 years ago
parallelformers
by
tunib-ai
0%
791
Toolkit for easy model parallelization
Starred by
+1
Created 4 years ago
Updated 2 years ago
alpa
by
alpa-projects
0.1%
3k
Auto-parallelization framework for large-scale neural network training and serving
Starred by
+17
Created 5 years ago
Updated 2 years ago
GLM-130B
by
zai-org
0.0%
8k
Bilingual model for research and evaluation
Starred by
+6
Created 3 years ago
Updated 2 years ago
FasterTransformer
by
NVIDIA
0.1%
6k
Optimized transformer library for inference
Starred by
+12
Created 4 years ago
Updated 1 year ago
orama
by
oramasearch
0.1%
10k
Browser-based search engine and RAG pipeline
Starred by
+2
Created 3 years ago
Updated 1 month ago
bitsandbytes
by
bitsandbytes-foundation
0.2%
8k
PyTorch library for k-bit quantization, enabling accessible LLMs
Starred by
+26
Created 4 years ago
Updated 5 days ago
examples
by
mosaicml
0%
463
Reference benchmarks for training and deploying ML models at scale
Starred by
Created 3 years ago
Updated 7 months ago
metaseq
by
facebookresearch
0%
7k
Codebase for large-scale transformer model development and deployment
Starred by
+11
Created 3 years ago
Updated 1 year ago
tevatron
by
texttron
0.1%
716
Unified toolkit for document retrieval across modalities, languages, and scale
Starred by
Created 4 years ago
Updated 1 month ago
trlx
by
CarperAI
0.1%
5k
Distributed RLHF for LLMs
Starred by
+16
Created 3 years ago
Updated 2 years ago
tiktoken
by
openai
0.3%
17k
Fast BPE tokenizer for OpenAI models
Starred by
+28
Created 3 years ago
Updated 3 months ago
whisper
by
openai
0.3%
94k
Speech recognition model for multilingual transcription/translation
Starred by
+40
Created 3 years ago
Updated 1 month ago
speechbrain
by
speechbrain
0.2%
11k
PyTorch toolkit for speech and text processing research
Starred by
+5
Created 5 years ago
Updated 2 days ago
galai
by
paperswithcode
0.0%
3k
Scientific language model API
Starred by
+5
Created 3 years ago
Updated 2 years ago
faiss
by
facebookresearch
0.2%
39k
Similarity search library for dense vectors
Starred by
+52
Created 9 years ago
Updated 3 days ago
tinygrad
by
tinygrad
0.2%
31k
Minimalist deep learning framework for education and exploration
Starred by
+30
Created 5 years ago
Updated 1 day ago
pytorch-lightning
by
Lightning-AI
0.1%
31k
Deep learning framework for pretraining, finetuning, and deploying AI models
Starred by
+31
Created 6 years ago
Updated 4 days ago
RL4LMs
by
allenai
0.1%
2k
RL library to fine-tune language models to human preferences
Starred by
+3
Created 3 years ago
Updated 1 year ago
t-few
by
r-three
0%
457
Code for parameter-efficient fine-tuning research paper
Created 3 years ago
Updated 2 years ago
manifest
by
HazyResearch
0%
445
SDK for prompt programming with foundation models
Starred by
+2
Created 3 years ago
Updated 1 year ago
AITemplate
by
facebookincubator
0.0%
5k
Generate high-performance inference engines
Starred by
+19
Created 3 years ago
Updated 2 weeks ago
lm-evaluation-harness
by
EleutherAI
0.4%
11k
Framework for few-shot language model evaluation
Starred by
+18
Created 5 years ago
Updated 4 days ago
s2orc
by
allenai
0%
1k
Corpus for NLP/text mining research on scientific papers
Starred by
Created 6 years ago
Updated 1 year ago
stable-diffusion
by
CompVis
0.1%
72k
Latent text-to-image diffusion model
Starred by
+54
Created 3 years ago
Updated 1 year ago
primeqa
by
primeqa
0%
740
Open-source repo for multilingual question answering research
Starred by
+3
Created 3 years ago
Updated 4 months ago
flax
by
google
0.2%
7k
NN library for JAX, designed for flexibility in neural network research
Starred by
+19
Created 6 years ago
Updated 4 days ago
optimum
by
huggingface
0.2%
3k
Hardware optimization tools for Transformers, Diffusers, etc
Starred by
+10
Created 4 years ago
Updated 1 week ago
sentence-transformers
by
huggingface
0.2%
18k
Framework for text embeddings, retrieval, and reranking
Starred by
+22
Created 6 years ago
Updated 2 weeks ago
annotated_deep_learning_paper_implementations
by
labmlai
0.2%
66k
PyTorch implementations/tutorials of deep learning papers with side-by-side notes
Starred by
+4
Created 5 years ago
Updated 5 days ago
datasets
by
huggingface
0.1%
21k
Access and process large AI datasets efficiently
Starred by
+23
Created 5 years ago
Updated 4 days ago
lightning-transformers
by
Lightning-Universe
0%
612
Archived library for training Transformers with PyTorch Lightning
Starred by
Created 5 years ago
Updated 3 years ago
netron
by
lutzroeder
0.1%
32k
Model visualizer for neural networks, deep learning, and ML
Starred by
+23
Created 15 years ago
Updated 1 day ago
unilm
by
microsoft
0.1%
22k
Foundation models for language, vision, speech, and multimodal tasks
Starred by
+19
Created 6 years ago
Updated 4 days ago
pyterrier
by
terrier-org
0%
492
Python framework for information retrieval and RAG
Created 5 years ago
Updated 5 days ago
text-to-text-transfer-transformer
by
google-research
0.1%
6k
Unified text-to-text transformer for NLP research
Starred by
+13
Created 6 years ago
Updated 1 week ago
DeBERTa
by
microsoft
0.1%
2k
BERT enhancement via disentangled attention, enhanced mask decoder
Starred by
+1
Created 5 years ago
Updated 2 years ago
nlp-recipes
by
microsoft
0.0%
6k
NLP examples and best practices as Jupyter notebooks
Starred by
Created 6 years ago
Updated 3 years ago
oie-resources
by
gkiril
0%
503
Extensive resources for Open Information Extraction (OIE) research
Created 7 years ago
Updated 3 years ago
fairseq
by
facebookresearch
0.1%
32k
Sequence modeling toolkit for translation, language modeling, and text generation research
Starred by
+42
Created 8 years ago
Updated 3 months ago
tokenizers
by
huggingface
0.2%
10k
Fast tokenizer library optimized for research and production
Starred by
+22
Created 6 years ago
Updated 5 days ago
transformers
by
huggingface
0.2%
156k
ML library for pretrained model inference and training
Starred by
+96
Created 7 years ago
Updated 1 day ago
BlingFire
by
microsoft
0%
2k
Fast text tokenization library
Starred by
+1
Created 6 years ago
Updated 1 year ago
anserini
by
castorini
0%
1k
Lucene toolkit for reproducible information retrieval research
Starred by
Created 10 years ago
Updated 2 days ago
awesome-information-retrieval
by
harpribot
0%
1k
Curated list of information retrieval resources
Starred by
Created 9 years ago
Updated 2 years ago
bert
by
google-research
0.1%
40k
TensorFlow code and pre-trained models for BERT
Starred by
+26
Created 7 years ago
Updated 1 year ago
tsv-utils
by
eBay
0.2%
1k
CLI tools for large tabular data files: filtering, statistics, sampling, joins, and more
Starred by
Created 9 years ago
Updated 3 years ago
tensorflow
by
tensorflow
0.1%
193k
Open-source ML framework
Starred by
+97
Created 10 years ago
Updated 1 day ago
spaCy
by
explosion
0.1%
33k
NLP library for production applications
Starred by
+40
Created 11 years ago
Updated 2 months ago
Feedback? Help us improve.