makemore  by karpathy

Character-level language model for generating text

created 3 years ago
3,207 stars

Top 15.4% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a single, hackable Python script for training autoregressive character-level language models, from simple bigrams to Transformers, on custom text datasets. It's designed for educational purposes, enabling users to generate new text that mimics the style of their input data, such as creating unique baby names or company names.

How It Works

The core of makemore is its implementation of various neural network architectures for character-level language modeling. It supports a range of models, including Bigram, MLP, CNN, RNN, LSTM, GRU, and Transformer, allowing users to explore different levels of complexity and performance. The autoregressive nature means each predicted character depends on the preceding sequence, capturing stylistic patterns in the training data.

Quick Start & Requirements

  • Install via: pip install torch (PyTorch is the only explicit requirement).
  • Usage: python makemore.py -i <input_file.txt> -o <output_directory>
  • Sampling: python makemore.py -i <input_file.txt> -o <output_directory> --sample-only
  • Documentation: README

Highlighted Details

  • Implements models from foundational papers: Bengio et al. (MLP), DeepMind WaveNet (CNN), Mikolov et al. (RNN), Graves et al. (LSTM), Cho et al. (GRU), Vaswani et al. (Transformer).
  • Default model is a 200K parameter Transformer.
  • Training runs on CPU but is significantly faster with a GPU.
  • Generates samples during training and can be run in sampling-only mode.

Maintenance & Community

  • Developed by Andrej Karpathy.
  • Primarily a single-file educational project, not a large-scale library.

Licensing & Compatibility

  • License: MIT
  • Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The CNN implementation is noted as "in progress." While training is possible on CPU, performance will be significantly slower compared to GPU acceleration. Hyperparameter tuning is mentioned as a way to achieve lower log probabilities, implying default settings may not be optimal.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
178 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

mlx-gpt2 by pranavjad

0.5%
393
Minimal GPT-2 implementation for educational purposes
created 1 year ago
updated 1 year ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Abhishek Thakur Abhishek Thakur(World's First 4x Kaggle GrandMaster), and
5 more.

xlnet by zihangdai

0.0%
6k
Language model research paper using generalized autoregressive pretraining
created 6 years ago
updated 2 years ago
Feedback? Help us improve.