Discover and explore top open-source AI tools and projects—updated daily.
antimatter15Reverse engineering Google's edge-optimized language model for local inference
Top 99.6% on SourcePulse
This repository details the reverse-engineering efforts for Google's Gemma 3n, an "open" language model optimized for edge devices. It targets engineers and researchers seeking to understand and potentially replicate the model's novel memory-saving architectures, aiming to facilitate porting to popular inference frameworks like llama.cpp or Huggingface Transformers. The primary benefit is demystifying Google's proprietary implementation and enabling broader accessibility and modification.
How It Works
The project dissects Gemma 3n's LiteRT MediaPipe .task file, identified as a zip archive containing compiled TFLite model components. It leverages a tflite parsing library and large language models (Claude, Gemini) to interpret low-level opcodes and draft equivalent PyTorch code. Key architectural elements under investigation include tied embedding and LM head weights, a "per-layer embeddings" mechanism for significant RAM reduction during inference, and the use of LAuReL (Low Rank) blocks within transformer layers to decrease parameter count and computational cost.
Quick Start & Requirements
.task file from Hugging Face (e.g., google/gemma-3n-E4B-it-litert-preview).Highlighted Details
Maintenance & Community
This repository is a personal reverse-engineering project. No specific community channels (Discord, Slack), roadmap, or formal maintenance structure are detailed in the README. The author explicitly seeks community contributions to develop a runnable open-source implementation.
Licensing & Compatibility
The repository itself does not specify a license. Gemma 3n is described as "open" but distributed in a compiled .task format. Compatibility for commercial use or closed-source linking is not addressed.
Limitations & Caveats
This is an exploratory reverse-engineering effort, not a production-ready implementation. The provided code is largely drafted with LLM assistance and requires further development for execution. The vision components are less explored, and the audio capabilities are not yet available. The author acknowledges potential inaccuracies and encourages community collaboration for a complete, runnable port.
6 months ago
Inactive
facebookresearch
SafeAILab