SpikeGPT  by ridgerchu

Generative language model research paper using spiking neural networks

Created 2 years ago
848 stars

Top 42.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

SpikeGPT implements a generative language model utilizing pure binary, event-driven spiking neural networks, offering a lightweight alternative to traditional models. It targets researchers and developers interested in energy-efficient AI and novel neural network architectures, providing a foundation for exploring spiking neural networks in large language model applications.

How It Works

SpikeGPT leverages spiking neural networks (SNNs) with binary activation units, enabling event-driven computation. This approach aims for reduced computational cost and energy consumption compared to standard deep learning models. The architecture is inspired by RWKV-LM, suggesting a recurrent or attention-based mechanism adapted for SNNs.

Quick Start & Requirements

  • Installation: Docker image available on GitHub for environment configuration.
  • Datasets: Requires downloading and configuring paths for datasets like enwik8 or pre-tokenized The Pile. WikiText-103 binidx file is available on Hugging Face.
  • Training: Supports multi-GPU training via Hugging Face Accelerate.
  • Inference: Download pre-trained model (5B tokens on OpenWebText) and modify run.py.
  • Resources: Fine-tuning on WikiText-103 suggests a learning rate around 3e-6 and adjustable batch sizes.

Highlighted Details

  • Implements generative pre-trained language model with spiking neural networks.
  • Utilizes pure binary, event-driven spiking activation units.
  • Inspired by RWKV-LM architecture.
  • Supports training on enwik8, large corpora (The Pile), and fine-tuning on WikiText-103.

Maintenance & Community

  • Discord server available for community support.
  • Project is associated with the paper "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks" (arXiv:2302.13939).

Licensing & Compatibility

  • License not explicitly stated in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial adoption. Detailed performance benchmarks or comparisons against traditional LLMs are not provided.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

InternEvo by InternLM

0.2%
407
Lightweight training framework for model pre-training
Created 1 year ago
Updated 4 weeks ago
Starred by Lukas Biewald Lukas Biewald(Cofounder of Weights & Biases), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

DialoGPT by microsoft

0.1%
2k
Response generation model via large-scale pretraining
Created 6 years ago
Updated 2 years ago
Feedback? Help us improve.