Discover and explore top open-source AI tools and projects—updated daily.
EverMind-AILLM memory framework scales to 100M+ tokens
Top 16.1% on SourcePulse
Summary MSA (Memory Sparse Attention) tackles the LLM context window limitation, which restricts long-term memory and reasoning. Unlike existing solutions plagued by precision decay or complex pipelines, MSA offers an end-to-end trainable, sparse latent-state memory framework. It enables processing up to 100 million tokens with minimal degradation, significantly enhancing LLM memory capacity and reasoning for researchers and engineers.
How It Works MSA achieves near-linear complexity via scalable sparse attention and document-wise RoPE. Key components include the Memory Sparse Attention layer, integrating top-k document selection with sparse attention for differentiability and inference decoupling. A Memory Parallel inference engine uses tiered KV cache compression (GPU routing keys, CPU content K/V) for efficient 100M token throughput on specialized hardware. The Memory Interleave mechanism facilitates adaptive multi-round, multi-hop reasoning through a retrieval-expansion-generation loop, boosting performance on complex, long-context tasks.
Quick Start & Requirements Code and pre-trained models are available. Achieving 100M token inference requires substantial hardware, specifically "2×A800 GPUs." Training involves extensive continuous pretraining on a 158.95 billion token dataset. Further details and project updates are available on the official homepage: https://evermind.ai/.
Highlighted Details
Maintenance & Community Maintained by the authors, with updates via the official homepage: https://evermind.ai/. No specific community channels (e.g., Discord, Slack) are mentioned.
Licensing & Compatibility The provided README does not specify a software license, creating uncertainty for commercial use or integration into closed-source projects.
Limitations & Caveats The primary adoption blocker is the lack of a clear license. Maximum context lengths necessitate high-end GPU hardware (e.g., A800s). Performance may still be constrained by the underlying backbone LLM's intrinsic reasoning capacity and parameter count.
6 days ago
Inactive
HazyResearch
mustafaaljadery
microsoft
flashinfer-ai