kan-gpt  by AdityaNG

PyTorch implementation of GPTs using Kolmogorov-Arnold Networks (KANs) for language modeling

created 1 year ago
720 stars

Top 48.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of Generative Pre-trained Transformers (GPTs) that leverage Kolmogorov-Arnold Networks (KANs) for language modeling. It aims to explore the potential of KANs in improving GPT architectures, offering a novel alternative to traditional MLP-based transformers for researchers and practitioners in natural language processing.

How It Works

KAN-GPT replaces the standard MLP layers within the GPT architecture with KANs. KANs represent functions as a composition of univariate functions on learnable grids, offering a potentially more efficient and interpretable alternative to MLPs. This approach allows for a more flexible and potentially higher-performing model by learning complex relationships through these univariate functions.

Quick Start & Requirements

  • Install from PyPI: pip install kan_gpt
  • Development setup requires cloning the repo, downloading datasets (TinyShakespeare, MNIST, WebText), and installing dependencies via pip install -r requirements.txt and pip install -e ..
  • Training can be initiated with python3 -m kan_gpt.train.
  • Prompting is demonstrated via python -m kan_gpt.prompt --prompt "..." --model_path (checkpoint).
  • Official documentation and usage examples are available in KAN_GPT.ipynb and kan_gpt/prompt.py.

Highlighted Details

  • Replaces MLP layers in GPT with Kolmogorov-Arnold Networks (KANs).
  • Includes scripts for dataset downloading (TinyShakespeare, MNIST, WebText).
  • Provides training scripts for both KAN-based and MLP-based GPTs.
  • Preliminary results suggest KAN-GPT performs slightly better than MLP-GPT on the Tiny Shakespeare dataset.

Maintenance & Community

  • The project is actively developed by Aditya Nalgunda Ganesh.
  • References include minGPT, pykan, webtext, and tinyshakespeare.
  • A CONTRIBUTING.md file is available for development guidelines.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • The project is in its early stages, with several TODO items including auto-downloading model weights, integrating with PyTorch Lightning, and adding comprehensive test cases.
  • Performance comparisons are currently limited to the Tiny Shakespeare dataset.
  • Requirements.txt constraints are noted for potential reduction.
Health Check
Last commit

8 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

mlx-gpt2 by pranavjad

0.5%
393
Minimal GPT-2 implementation for educational purposes
created 1 year ago
updated 1 year ago
Feedback? Help us improve.