Discover and explore top open-source AI tools and projects—updated daily.
cactus-computeTiny function-calling AI for edge devices
Top 18.2% on SourcePulse
Summary
Needle addresses the challenge of deploying advanced AI capabilities on resource-constrained consumer devices by distilling a large model (Gemini 3.1) into a compact 26-million-parameter "Simple Attention Network." This project targets developers and researchers aiming to integrate AI into edge devices like phones, watches, and glasses, offering the benefit of local fine-tuning and high-speed inference without relying on cloud infrastructure.
How It Works
Needle employs a novel Simple Attention Network (SAN) architecture, featuring a 12-layer encoder and an 8-layer decoder. Key components include Gated Residual connections, Rotary Positional Embeddings (RoPE), Grouped Query Attention (GQA) in the encoder, and Masked Self-Attention in the decoder. The design utilizes tied linear layers and ZCRMSNorm for efficiency. This approach allows the model to achieve remarkable performance metrics with a significantly reduced parameter count, making it suitable for edge deployment.
Quick Start & Requirements
Installation involves cloning the repository (git clone https://github.com/cactus-compute/needle.git), sourcing a setup script (source ./setup), and running needle playground to launch a local web UI (http://127.0.0.1:7860) for testing and fine-tuning. Weights are automatically downloaded. Python usage examples are provided for direct integration.
Highlighted Details
Maintenance & Community
The project is authored by Henry Ndubuaku and colleagues, as indicated by the associated citation. Further community engagement channels or specific maintenance schedules are not detailed in the provided README. The primary resource is the GitHub repository: https://github.com/cactus-compute/needle.git.
Licensing & Compatibility
The specific open-source license for Needle is not explicitly stated in the provided text. Compatibility is geared towards consumer devices, enabling local execution.
Limitations & Caveats
Small models like Needle can be "finicky" and may require specific fine-tuning for optimal performance with custom tools. Larger models generally offer broader scope and capacity, particularly for complex conversational tasks. Needle represents an experimental exploration into Simple Attention Networks.
1 week ago
Inactive
facebookresearch
openvinotoolkit