makemore by karpathy

Character-level language model for generating text

Created 3 years ago

3,576 stars

Top 13.5% on SourcePulse

View on GitHub

3 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Deshraj Yadav

Cofounder of Mem0

Nathan Lambert

Research Scientist at AI2

Project Summary

This project provides a single, hackable Python script for training autoregressive character-level language models, from simple bigrams to Transformers, on custom text datasets. It's designed for educational purposes, enabling users to generate new text that mimics the style of their input data, such as creating unique baby names or company names.

How It Works

The core of makemore is its implementation of various neural network architectures for character-level language modeling. It supports a range of models, including Bigram, MLP, CNN, RNN, LSTM, GRU, and Transformer, allowing users to explore different levels of complexity and performance. The autoregressive nature means each predicted character depends on the preceding sequence, capturing stylistic patterns in the training data.

Quick Start & Requirements

Install via: pip install torch (PyTorch is the only explicit requirement).
Usage: python makemore.py -i <input_file.txt> -o <output_directory>
Sampling: python makemore.py -i <input_file.txt> -o <output_directory> --sample-only
Documentation: README

Highlighted Details

Implements models from foundational papers: Bengio et al. (MLP), DeepMind WaveNet (CNN), Mikolov et al. (RNN), Graves et al. (LSTM), Cho et al. (GRU), Vaswani et al. (Transformer).
Default model is a 200K parameter Transformer.
Training runs on CPU but is significantly faster with a GPU.
Generates samples during training and can be run in sampling-only mode.

Maintenance & Community

Developed by Andrej Karpathy.
Primarily a single-file educational project, not a large-scale library.

Licensing & Compatibility

License: MIT
Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The CNN implementation is noted as "in progress." While training is possible on CPU, performance will be significantly slower compared to GPU acceleration. Hyperparameter tuning is mentioned as a way to achieve lower log probabilities, implying default settings may not be optimal.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

81 stars in the last 30 days