Character-level language model for generating text
Top 15.4% on sourcepulse
This project provides a single, hackable Python script for training autoregressive character-level language models, from simple bigrams to Transformers, on custom text datasets. It's designed for educational purposes, enabling users to generate new text that mimics the style of their input data, such as creating unique baby names or company names.
How It Works
The core of makemore
is its implementation of various neural network architectures for character-level language modeling. It supports a range of models, including Bigram, MLP, CNN, RNN, LSTM, GRU, and Transformer, allowing users to explore different levels of complexity and performance. The autoregressive nature means each predicted character depends on the preceding sequence, capturing stylistic patterns in the training data.
Quick Start & Requirements
pip install torch
(PyTorch is the only explicit requirement).python makemore.py -i <input_file.txt> -o <output_directory>
python makemore.py -i <input_file.txt> -o <output_directory> --sample-only
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The CNN implementation is noted as "in progress." While training is possible on CPU, performance will be significantly slower compared to GPU acceleration. Hyperparameter tuning is mentioned as a way to achieve lower log probabilities, implying default settings may not be optimal.
1 year ago
Inactive