Discover and explore top open-source AI tools and projects—updated daily.
sophgoGenerative AI model deployment on Sophgo edge TPUs
Top 99.4% on SourcePulse
Summary
The sophgo/LLM-TPU project facilitates the deployment of various open-source generative AI models, with a focus on Large Language Models (LLMs), onto Sophgo's BM1684X and BM1688 (CV186X) AI accelerator chips. It targets developers and researchers aiming to leverage specialized hardware for efficient AI inference. The primary benefit is enabling high-performance generative AI workloads on custom ASICs, bridging the gap between model availability and dedicated hardware deployment.
How It Works
This project employs a two-stage process: model conversion and runtime inference. Models are first transformed into a proprietary bmodel format using the TPU-MLIR compiler. Subsequently, the tpu-runtime inference engine, accessed via C++ interfaces, handles the execution of these bmodel files on either PCIE or SoC environments. This approach allows for optimization tailored to the specific architecture of the BM1684X/BM1688 chips.
Quick Start & Requirements
To begin, clone the repository and execute the provided shell script:
git clone https://github.com/sophgo/LLM-TPU.git
./run.sh --model llama2-7b
Primary requirements include Sophgo BM1684X or BM1688 hardware. Specific software dependencies for the runtime and compiler are implied but not exhaustively detailed in the README. Further details are available in the "Quick Start" section and associated documentation links.
Highlighted Details
Maintenance & Community
The project appears actively updated, with recent additions in April 2025. For hardware-specific inquiries, users are directed to contact Sophgo via their official website. No direct links to community forums like Discord or Slack were found in the provided README.
Licensing & Compatibility
The README does not specify a software license for the LLM-TPU project itself. Compatibility for commercial use or integration into closed-source projects is therefore unclear and requires direct inquiry with Sophgo.
Limitations & Caveats
The project is exclusively tied to Sophgo's BM1684X and BM1688 hardware, limiting its applicability to users without this specific silicon. The absence of a stated software license presents a significant adoption blocker for many organizations. Precision optimization guidance suggests prioritizing AWQ or GPTQ models, or using llmc-tpu for calibration with floating-point models.
5 days ago
1 week
trymirai
huggingface
zhihu
zml
b4rtaz
lyogavin
kvcache-ai
openvinotoolkit