cross-platform-llm-client  by orailnoor

Cross-platform AI chat client for local and cloud LLM inference

Created 1 month ago
263 stars

Top 96.8% on SourcePulse

GitHubView on GitHub
Project Summary

A production-ready, cross-platform AI chat client built with Flutter, this project enables users to run LLMs locally on Android and iOS devices or seamlessly transition to cloud APIs. It offers a unified interface for both on-device and cloud-based AI interactions, giving users control over their data and model execution.

How It Works

The client leverages Flutter for its UI and state management (GetX), with local inference on Android and iOS powered by a custom llama.cpp plugin (llama_flutter_android) utilizing Vulkan (Android) and Metal (iOS) for GPU acceleration. It supports GGUF model formats, automatically detects device RAM for optimal configuration, and provides a fallback to cloud APIs (OpenAI, Anthropic, Google Gemini, Kimi) for enhanced capabilities or unsupported platforms. A Services layer abstracts inference, cloud communication, and data persistence (Hive).

Quick Start & Requirements

  • Prerequisites: Flutter SDK >= 3.3.0, Android SDK (API 26+), JDK 17, NDK (bundled with Android SDK).
  • Android: flutter pub get, cd android, ./gradlew assembleDebug (or assembleRelease).
  • iOS: flutter pub get, cd ios, pod install, flutter build ios. For iPad sideloading, download PrivateLM-iOS.zip from the Releases page and install the .ipa via AltStore, Sideloadly, or Xcode.
  • Web: flutter pub get, flutter build web --release.
  • Local Inference (iOS): Requires Metal GPU acceleration.
  • Local Inference (Web): Not currently supported; cloud-only (local coming soon).

Highlighted Details

  • Local Inference: GGUF models run directly on Android (Vulkan) and iOS (Metal) with GPU acceleration, requiring no internet after download.
  • Cloud API Integration: Seamless switching between OpenAI, Anthropic, Google Gemini, and Kimi (Moonshot AI).
  • Multimodal Chat: Supports sending text and images, with vision working for local (Qwen2-VL) and cloud models.
  • Smart Auto-Configuration: Automatically detects device RAM to recommend optimal context size and token limits.
  • Task Management: Includes a dedicated view for structured AI-assisted workflows alongside free-form chat.
  • Data Persistence: All chats, tasks, and settings are stored locally using Hive.

Maintenance & Community

No specific details on contributors, sponsorships, or community channels (like Discord/Slack) were found in the provided README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive for commercial use.

Limitations & Caveats

The Web platform currently only supports cloud APIs, with local inference planned for the future. iPhone support is experimental; iPad is the recommended iOS target due to RAM requirements for local models. Release builds require configuring signing keys for Android.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
8
Issues (30d)
8
Star History
261 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

JittorLLMs by Jittor

0%
2k
Low-resource LLM inference library
Created 3 years ago
Updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
2 more.

torchchat by pytorch

0.0%
4k
PyTorch-native SDK for local LLM inference across diverse platforms
Created 2 years ago
Updated 8 months ago
Feedback? Help us improve.