Qwen3.5-9B-ToolHub  by chixi4

Local multimodal LLM with tool integration

Created 1 month ago
272 stars

Top 94.7% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a local, integrated deployment solution for the Qwen3.5-9B multimodal model, enabling tool-calling capabilities. It targets users who require on-premises AI inference for tasks like web searching, image analysis, and document processing, offering enhanced privacy and offline functionality. The primary benefit is a self-contained system that leverages local GPU resources for complex AI operations.

How It Works

The system integrates the Qwen3.5-9B multimodal large language model with a tool-calling framework, running inference locally on the user's NVIDIA GPU. It utilizes llama.cpp for high-performance GGUF model inference. The architecture allows the model to access external tools, such as web search engines and local file systems, enabling it to perform tasks like web scraping, data extraction, and document summarization. An OpenAI-compatible API endpoint facilitates integration with various clients.

Quick Start & Requirements

  • Primary Install (Windows):
    1. Run bootstrap.bat (downloads ~6 GB model).
    2. Run .\start_8080_toolhub_stack.cmd start.
    3. Access via browser at http://127.0.0.1:8080.
    • Stop: .\start_8080_toolhub_stack.cmd stop.
  • Docker Install: docker compose up --build
  • WSL Install: ./install.sh followed by ./start_8080_toolhub_stack.sh start.
  • Prerequisites:
    • OS: Windows 10/11 (primary), Docker, WSL.
    • GPU: NVIDIA graphics card with ≥ 8 GB VRAM (≥ 12 GB recommended for Q8 quantization).
    • Software: Python 3.10+.
  • Resource Footprint: Initial model download is ~6 GB (base) or larger for Q8 quantization. Startup requires 30-60 seconds for model loading.
  • Documentation: Detailed Install, FAQ, Docker Compose.

Highlighted Details

  • Multimodal Capabilities: Supports image uploads for querying, including local zoom and image-to-image search.
  • Tool Integration: Can perform web searches, scrape and summarize web pages with sources, and read local files (documents, logs).
  • Reasoning: Features built-in Chain-of-Thought (CoT) for complex problem-solving.
  • API Access: Provides an OpenAI-compatible API endpoint (/v1) for seamless integration.
  • Quantization Options: Offers Q8 quantization for improved performance on systems with ≥ 12 GB VRAM.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were provided in the README.

Licensing & Compatibility

The license for the ToolHub wrapper itself is not explicitly stated. It relies on the Qwen3.5 model and llama.cpp, which have their own respective licenses. Compatibility for commercial use or linking with closed-source projects would require a review of the underlying component licenses.

Limitations & Caveats

The primary deployment path targets Windows 10/11, requiring specific NVIDIA hardware. While Docker and WSL options exist, users must have compatible environments set up. The system's capabilities are dependent on the performance and VRAM of the local GPU.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
26 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.