Qwen3.5-9B-ToolHub by chixi4

Local multimodal LLM with tool integration

Created 4 months ago

273 stars

Top 94.3% on SourcePulse

Project Summary

This project provides a local, integrated deployment solution for the Qwen3.5-9B multimodal model, enabling tool-calling capabilities. It targets users who require on-premises AI inference for tasks like web searching, image analysis, and document processing, offering enhanced privacy and offline functionality. The primary benefit is a self-contained system that leverages local GPU resources for complex AI operations.

How It Works

The system integrates the Qwen3.5-9B multimodal large language model with a tool-calling framework, running inference locally on the user's NVIDIA GPU. It utilizes llama.cpp for high-performance GGUF model inference. The architecture allows the model to access external tools, such as web search engines and local file systems, enabling it to perform tasks like web scraping, data extraction, and document summarization. An OpenAI-compatible API endpoint facilitates integration with various clients.

Quick Start & Requirements

Primary Install (Windows):
1. Run bootstrap.bat (downloads ~6 GB model).
2. Run .\start_8080_toolhub_stack.cmd start.
3. Access via browser at http://127.0.0.1:8080.
- Stop: .\start_8080_toolhub_stack.cmd stop.
Docker Install: docker compose up --build
WSL Install: ./install.sh followed by ./start_8080_toolhub_stack.sh start.
Prerequisites:
- OS: Windows 10/11 (primary), Docker, WSL.
- GPU: NVIDIA graphics card with ≥ 8 GB VRAM (≥ 12 GB recommended for Q8 quantization).
- Software: Python 3.10+.
Resource Footprint: Initial model download is ~6 GB (base) or larger for Q8 quantization. Startup requires 30-60 seconds for model loading.
Documentation: Detailed Install, FAQ, Docker Compose.

Highlighted Details

Multimodal Capabilities: Supports image uploads for querying, including local zoom and image-to-image search.
Tool Integration: Can perform web searches, scrape and summarize web pages with sources, and read local files (documents, logs).
Reasoning: Features built-in Chain-of-Thought (CoT) for complex problem-solving.
API Access: Provides an OpenAI-compatible API endpoint (/v1) for seamless integration.
Quantization Options: Offers Q8 quantization for improved performance on systems with ≥ 12 GB VRAM.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were provided in the README.

Licensing & Compatibility

The license for the ToolHub wrapper itself is not explicitly stated. It relies on the Qwen3.5 model and llama.cpp, which have their own respective licenses. Compatibility for commercial use or linking with closed-source projects would require a review of the underlying component licenses.

Limitations & Caveats

The primary deployment path targets Windows 10/11, requiring specific NVIDIA hardware. While Docker and WSL options exist, users must have compatible environments set up. The system's capabilities are dependent on the performance and VRAM of the local GPU.

Qwen3.5-9B-ToolHub by chixi4

Explore Similar Projects

kolosal-cli by KolosalAI

Multi-Agent-GPT by YangXuanyi

attachments by MaximeRivest

tiny-qwen by Emericen

omnitool by omnitool-ai

AgnesAI-Models by AgnesAI-Labs

mentals-ai by turing-machines

langchain-code by zamalali

Sidekick by johnbean393

pixeltable by pixeltable

OmAgent by om-ai-lab

inference by xorbitsai