kan-gpt  by AdityaNG

PyTorch implementation of GPTs using Kolmogorov-Arnold Networks (KANs) for language modeling

Created 1 year ago
722 stars

Top 47.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a PyTorch implementation of Generative Pre-trained Transformers (GPTs) that leverage Kolmogorov-Arnold Networks (KANs) for language modeling. It aims to explore the potential of KANs in improving GPT architectures, offering a novel alternative to traditional MLP-based transformers for researchers and practitioners in natural language processing.

How It Works

KAN-GPT replaces the standard MLP layers within the GPT architecture with KANs. KANs represent functions as a composition of univariate functions on learnable grids, offering a potentially more efficient and interpretable alternative to MLPs. This approach allows for a more flexible and potentially higher-performing model by learning complex relationships through these univariate functions.

Quick Start & Requirements

  • Install from PyPI: pip install kan_gpt
  • Development setup requires cloning the repo, downloading datasets (TinyShakespeare, MNIST, WebText), and installing dependencies via pip install -r requirements.txt and pip install -e ..
  • Training can be initiated with python3 -m kan_gpt.train.
  • Prompting is demonstrated via python -m kan_gpt.prompt --prompt "..." --model_path (checkpoint).
  • Official documentation and usage examples are available in KAN_GPT.ipynb and kan_gpt/prompt.py.

Highlighted Details

  • Replaces MLP layers in GPT with Kolmogorov-Arnold Networks (KANs).
  • Includes scripts for dataset downloading (TinyShakespeare, MNIST, WebText).
  • Provides training scripts for both KAN-based and MLP-based GPTs.
  • Preliminary results suggest KAN-GPT performs slightly better than MLP-GPT on the Tiny Shakespeare dataset.

Maintenance & Community

  • The project is actively developed by Aditya Nalgunda Ganesh.
  • References include minGPT, pykan, webtext, and tinyshakespeare.
  • A CONTRIBUTING.md file is available for development guidelines.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • The project is in its early stages, with several TODO items including auto-downloading model weights, integrating with PyTorch Lightning, and adding comprehensive test cases.
  • Performance comparisons are currently limited to the Tiny Shakespeare dataset.
  • Requirements.txt constraints are noted for potential reduction.
Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Elvis Saravia Elvis Saravia(Founder of DAIR.AI) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

awesome-transformer-nlp by cedrickchee

0%
1k
Curated list of NLP resources for Transformer networks
Created 6 years ago
Updated 10 months ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), François Chollet François Chollet(Author of Keras; Cofounder of Ndea, ARC Prize), and
42 more.

spaCy by explosion

0.1%
32k
NLP library for production applications
Created 11 years ago
Updated 3 months ago
Feedback? Help us improve.