SpikeGPT by ridgerchu

Generative language model research paper using spiking neural networks

Created 2 years ago

877 stars

Top 41.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Victor Taelin

Author of Bend, Kind, HVM

Project Summary

SpikeGPT implements a generative language model utilizing pure binary, event-driven spiking neural networks, offering a lightweight alternative to traditional models. It targets researchers and developers interested in energy-efficient AI and novel neural network architectures, providing a foundation for exploring spiking neural networks in large language model applications.

How It Works

SpikeGPT leverages spiking neural networks (SNNs) with binary activation units, enabling event-driven computation. This approach aims for reduced computational cost and energy consumption compared to standard deep learning models. The architecture is inspired by RWKV-LM, suggesting a recurrent or attention-based mechanism adapted for SNNs.

Quick Start & Requirements

Installation: Docker image available on GitHub for environment configuration.
Datasets: Requires downloading and configuring paths for datasets like enwik8 or pre-tokenized The Pile. WikiText-103 binidx file is available on Hugging Face.
Training: Supports multi-GPU training via Hugging Face Accelerate.
Inference: Download pre-trained model (5B tokens on OpenWebText) and modify run.py.
Resources: Fine-tuning on WikiText-103 suggests a learning rate around 3e-6 and adjustable batch sizes.

Highlighted Details

Implements generative pre-trained language model with spiking neural networks.
Utilizes pure binary, event-driven spiking activation units.
Inspired by RWKV-LM architecture.
Supports training on enwik8, large corpora (The Pile), and fine-tuning on WikiText-103.

Maintenance & Community

Discord server available for community support.
Project is associated with the paper "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks" (arXiv:2302.13939).

Licensing & Compatibility

License not explicitly stated in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial adoption. Detailed performance benchmarks or comparisons against traditional LLMs are not provided.

Health Check

Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days