Discover and explore top open-source AI tools and projects—updated daily.
Spiking brain-inspired LLMs utilize hybrid attention and sparsity
New!
Top 51.5% on SourcePulse
SpikingBrain-7B is a large language model inspired by brain mechanisms, integrating hybrid attention, MoE, and spike encoding. It targets researchers and developers seeking efficient LLM training and inference, offering significant speedups and sparsity for long sequences, with potential applications in neuromorphic computing.
How It Works
The architecture integrates hybrid efficient attention, Mixture-of-Experts (MoE) modules, and a novel spike encoding mechanism inspired by brain functions. This design facilitates continual pre-training using minimal data (<2%) while matching mainstream model performance. It supports adaptation for non-NVIDIA clusters and achieves significant speedups (over 100x TTFT for 4M tokens) and high sparsity (over 69% micro-level) via its spiking activations and MoE sparsity, offering insights for neuromorphic chip design.
Quick Start & Requirements
docker.1ms.run/vllm/vllm-openai:v0.10.0
). vLLM plugin installation requires cloning the repository and running pip install .
within the vllm-hymeta
directory. HuggingFace and quantized versions are loadable directly.flash_attn==2.7.3
, flash-linear-attention==0.1
, vllm==0.10.0
, torch==2.7.1
, and standard Python build tools.Highlighted Details
Maintenance & Community
No specific details on contributors, sponsorships, or community channels (Discord/Slack) are provided in the README. A technical report and arXiv link are available for deeper insights.
Licensing & Compatibility
The README does not explicitly state the license type or compatibility restrictions.
Limitations & Caveats
The W8ASpike quantized version employs 'pseudo-spiking,' a tensor-level activation approximation, not true event-driven spiking. True-spiking functionality necessitates specific asynchronous hardware and event-driven operators, which are beyond this repository's scope. Performance benchmarks mention baselines trained on limited Chinese data, potentially impacting comparisons on other datasets.
2 days ago
Inactive