needle by cactus-compute

Tiny function-calling AI for edge devices

Created 4 months ago

2,691 stars

Top 16.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

Summary

Needle addresses the challenge of deploying advanced AI capabilities on resource-constrained consumer devices by distilling a large model (Gemini 3.1) into a compact 26-million-parameter "Simple Attention Network." This project targets developers and researchers aiming to integrate AI into edge devices like phones, watches, and glasses, offering the benefit of local fine-tuning and high-speed inference without relying on cloud infrastructure.

How It Works

Needle employs a novel Simple Attention Network (SAN) architecture, featuring a 12-layer encoder and an 8-layer decoder. Key components include Gated Residual connections, Rotary Positional Embeddings (RoPE), Grouped Query Attention (GQA) in the encoder, and Masked Self-Attention in the decoder. The design utilizes tied linear layers and ZCRMSNorm for efficiency. This approach allows the model to achieve remarkable performance metrics with a significantly reduced parameter count, making it suitable for edge deployment.

Quick Start & Requirements

Installation involves cloning the repository (git clone https://github.com/cactus-compute/needle.git), sourcing a setup script (source ./setup), and running needle playground to launch a local web UI (http://127.0.0.1:7860) for testing and fine-tuning. Weights are automatically downloaded. Python usage examples are provided for direct integration.

Highlighted Details

Achieves production speeds of 6000 tokens/sec prefill and 1200 tokens/sec decoding.
Model size is a compact 26 million parameters.
Pretrained on 200 billion tokens using 16 TPU v6e (27 hours), followed by 2 billion function call tokens (45 minutes).
Outperforms comparable small models (e.g., FunctionGemma-270m) on single-shot function call tasks.

Maintenance & Community

The project is authored by Henry Ndubuaku and colleagues, as indicated by the associated citation. Further community engagement channels or specific maintenance schedules are not detailed in the provided README. The primary resource is the GitHub repository: https://github.com/cactus-compute/needle.git.

Licensing & Compatibility

The specific open-source license for Needle is not explicitly stated in the provided text. Compatibility is geared towards consumer devices, enabling local execution.

Limitations & Caveats

Small models like Needle can be "finicky" and may require specific fine-tuning for optimal performance with custom tools. Larger models generally offer broader scope and capacity, particularly for complex conversational tasks. Needle represents an experimental exploration into Simple Attention Networks.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

109 stars in the last 30 days