Proxy server for load balancing and securing Ollama instances
Top 65.7% on sourcepulse
This project provides a lightweight, secure proxy server for managing multiple Ollama instances, aimed at developers and users needing to scale their LLM deployments with enhanced security and load balancing. It offers improved responsiveness and centralized management for distributed Ollama backends.
How It Works
The proxy server employs a load-balancing strategy, routing incoming requests to the backend Ollama instance with the fewest active connections. It implements bearer token authentication for security and utilizes asynchronous logging to a CSV file without impacting performance. Connection pooling and proper forwarding of streaming responses are key architectural choices for efficient and responsive LLM interactions.
Quick Start & Requirements
git clone https://github.com/ParisNeo/ollama_proxy_server.git
cd ollama_proxy_server
pip install -r requirements.txt
pip install .
docker build -t ollama_proxy_server .
and docker run -p 8080:8080 -v $(pwd)/config.ini:/app/config.ini -v $(pwd)/authorized_users.txt:/app/authorized_users.txt ollama_proxy_server
.config.ini
for backend URLs and authorized_users.txt
for credentials.python main.py --config config.ini --users_list authorized_users.txt
.python add_user.py <username> <key>
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is not yet published on PyPI. The CONTRIBUTING.md
file is noted as "to be added."
3 days ago
1 day