MiniCPM-CookBook  by OpenBMB

Small language models for edge deployment

Created 1 year ago
297 stars

Top 89.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) from ModelBest. It targets developers and researchers looking to deploy, fine-tune, and apply these lightweight, high-performance models on edge devices, offering capabilities that rival larger models.

How It Works

The MiniCPM series focuses on achieving exceptional performance on edge devices through efficient architecture and optimization techniques. The models are designed for low-resource environments, enabling applications on smartphones, computers, and other smart terminals. The guide details various deployment methods, including transformers, vLLM, llama.cpp, and MLX, catering to diverse hardware and software stacks.

Quick Start & Requirements

  • Installation: Primarily involves cloning the repository and following specific deployment instructions for chosen models and frameworks (e.g., transformers, vllm, llama.cpp).
  • Prerequisites: Varies by deployment method; common requirements include Python, specific CUDA versions for GPU acceleration, and potentially libraries like transformers, vllm, llama-cpp-python, or mlx. Hardware support includes GPU, CPU, and NPU across various operating systems (Linux, macOS, Windows, Android, iOS).
  • Resources: Model sizes range from 1.2B to 4B parameters, with specific hardware requirements (e.g., 4GB VRAM for RAG with Langchain) detailed within the guide.
  • Links: MiniCPM Repo, MiniCPM-V Repo, Knowledge Base, English Readme, Discord, WeChat Group.

Highlighted Details

  • Comprehensive guides for inference, quantization (AWQ, GGUF, GPTQ, BNB), fine-tuning (SFT, RLHF), and multimodal applications.
  • Demonstrates advanced use cases like RAG, function calling, agent construction, and real-time video/image understanding.
  • Supports a wide array of hardware including GPUs, CPUs, and NPUs across multiple operating systems and mobile platforms.
  • Provides technical reports detailing model architecture, attention mechanisms, and decoding principles.

Maintenance & Community

The project is a collaborative effort involving ModelBest, OpenBMB, and Tsinghua NLP Lab. It actively encourages community contributions for tutorials, usage experiences, and ecosystem adaptations. Links to Discord and WeChat groups are provided for community engagement.

Licensing & Compatibility

The repository itself is open-source, but specific model licenses should be verified. Compatibility for commercial use or closed-source linking depends on the individual model licenses, which are not explicitly detailed in the README.

Limitations & Caveats

While the guide covers numerous deployment scenarios, users must consult specific model documentation for detailed hardware requirements and potential performance variations. Some advanced features or specific model versions might require particular software versions or configurations.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.1%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 1 month ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

4.6%
7k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 4 months ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
4 more.

ktransformers by kvcache-ai

0.2%
16k
Framework for LLM inference optimization experimentation
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.