SpikeGPT  by ridgerchu

Generative language model research paper using spiking neural networks

Created 3 years ago
903 stars

Top 39.7% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

SpikeGPT implements a generative language model utilizing pure binary, event-driven spiking neural networks, offering a lightweight alternative to traditional models. It targets researchers and developers interested in energy-efficient AI and novel neural network architectures, providing a foundation for exploring spiking neural networks in large language model applications.

How It Works

SpikeGPT leverages spiking neural networks (SNNs) with binary activation units, enabling event-driven computation. This approach aims for reduced computational cost and energy consumption compared to standard deep learning models. The architecture is inspired by RWKV-LM, suggesting a recurrent or attention-based mechanism adapted for SNNs.

Quick Start & Requirements

  • Installation: Docker image available on GitHub for environment configuration.
  • Datasets: Requires downloading and configuring paths for datasets like enwik8 or pre-tokenized The Pile. WikiText-103 binidx file is available on Hugging Face.
  • Training: Supports multi-GPU training via Hugging Face Accelerate.
  • Inference: Download pre-trained model (5B tokens on OpenWebText) and modify run.py.
  • Resources: Fine-tuning on WikiText-103 suggests a learning rate around 3e-6 and adjustable batch sizes.

Highlighted Details

  • Implements generative pre-trained language model with spiking neural networks.
  • Utilizes pure binary, event-driven spiking activation units.
  • Inspired by RWKV-LM architecture.
  • Supports training on enwik8, large corpora (The Pile), and fine-tuning on WikiText-103.

Maintenance & Community

  • Discord server available for community support.
  • Project is associated with the paper "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks" (arXiv:2302.13939).

Licensing & Compatibility

  • License not explicitly stated in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial adoption. Detailed performance benchmarks or comparisons against traditional LLMs are not provided.

Health Check
Last Commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

InternEvo by InternLM

0%
419
Lightweight training framework for model pre-training
Created 2 years ago
Updated 9 months ago
Starred by Lukas Biewald Lukas Biewald(Cofounder of Weights & Biases), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

DialoGPT by microsoft

0%
2k
Response generation model via large-scale pretraining
Created 6 years ago
Updated 3 years ago
Feedback? Help us improve.