makemore  by karpathy

Character-level language model for generating text

Created 3 years ago
3,306 stars

Top 14.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a single, hackable Python script for training autoregressive character-level language models, from simple bigrams to Transformers, on custom text datasets. It's designed for educational purposes, enabling users to generate new text that mimics the style of their input data, such as creating unique baby names or company names.

How It Works

The core of makemore is its implementation of various neural network architectures for character-level language modeling. It supports a range of models, including Bigram, MLP, CNN, RNN, LSTM, GRU, and Transformer, allowing users to explore different levels of complexity and performance. The autoregressive nature means each predicted character depends on the preceding sequence, capturing stylistic patterns in the training data.

Quick Start & Requirements

  • Install via: pip install torch (PyTorch is the only explicit requirement).
  • Usage: python makemore.py -i <input_file.txt> -o <output_directory>
  • Sampling: python makemore.py -i <input_file.txt> -o <output_directory> --sample-only
  • Documentation: README

Highlighted Details

  • Implements models from foundational papers: Bengio et al. (MLP), DeepMind WaveNet (CNN), Mikolov et al. (RNN), Graves et al. (LSTM), Cho et al. (GRU), Vaswani et al. (Transformer).
  • Default model is a 200K parameter Transformer.
  • Training runs on CPU but is significantly faster with a GPU.
  • Generates samples during training and can be run in sampling-only mode.

Maintenance & Community

  • Developed by Andrej Karpathy.
  • Primarily a single-file educational project, not a large-scale library.

Licensing & Compatibility

  • License: MIT
  • Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The CNN implementation is noted as "in progress." While training is possible on CPU, performance will be significantly slower compared to GPU acceleration. Hyperparameter tuning is mentioned as a way to achieve lower log probabilities, implying default settings may not be optimal.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
62 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.