SpikeGPT  by ridgerchu

Generative language model research paper using spiking neural networks

created 2 years ago
832 stars

Top 43.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

SpikeGPT implements a generative language model utilizing pure binary, event-driven spiking neural networks, offering a lightweight alternative to traditional models. It targets researchers and developers interested in energy-efficient AI and novel neural network architectures, providing a foundation for exploring spiking neural networks in large language model applications.

How It Works

SpikeGPT leverages spiking neural networks (SNNs) with binary activation units, enabling event-driven computation. This approach aims for reduced computational cost and energy consumption compared to standard deep learning models. The architecture is inspired by RWKV-LM, suggesting a recurrent or attention-based mechanism adapted for SNNs.

Quick Start & Requirements

  • Installation: Docker image available on GitHub for environment configuration.
  • Datasets: Requires downloading and configuring paths for datasets like enwik8 or pre-tokenized The Pile. WikiText-103 binidx file is available on Hugging Face.
  • Training: Supports multi-GPU training via Hugging Face Accelerate.
  • Inference: Download pre-trained model (5B tokens on OpenWebText) and modify run.py.
  • Resources: Fine-tuning on WikiText-103 suggests a learning rate around 3e-6 and adjustable batch sizes.

Highlighted Details

  • Implements generative pre-trained language model with spiking neural networks.
  • Utilizes pure binary, event-driven spiking activation units.
  • Inspired by RWKV-LM architecture.
  • Supports training on enwik8, large corpora (The Pile), and fine-tuning on WikiText-103.

Maintenance & Community

  • Discord server available for community support.
  • Project is associated with the paper "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks" (arXiv:2302.13939).

Licensing & Compatibility

  • License not explicitly stated in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial adoption. Detailed performance benchmarks or comparisons against traditional LLMs are not provided.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
25 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
258
Efficiently train foundation models with PyTorch
created 1 year ago
updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
10 more.

TinyLlama by jzhang38

0.3%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Daniel Gross Daniel Gross(Cofounder of Safe Superintelligence), and
13 more.

RWKV-LM by BlinkDL

0.2%
14k
RNN for LLM, transformer-level performance, parallelizable training
created 4 years ago
updated 1 week ago
Feedback? Help us improve.