Text-generation API for scalable GPT-2 inference via Cloud Run
Top 87.1% on sourcepulse
This project provides a text-generation API using GPT-2, specifically designed for deployment on Google Cloud Run. It targets developers and researchers who want to easily integrate GPT-2's text generation capabilities into applications or provide a public-facing demo, offering a cost-effective and scalable solution.
How It Works
The API is built using starlette
for asynchronous request handling. It packages the GPT-2 model (specifically the 117M parameter version due to Cloud Run's memory constraints) directly within the Docker container. This approach ensures stateless operation and efficient resource utilization on Cloud Run, allowing for automatic scaling and leveraging the platform's generous free tier for low-usage scenarios.
Quick Start & Requirements
docker build . -t gpt2
docker run -p 8080:8080 --memory="2g" --cpus="1" gpt2
Highlighted Details
length
and temperature
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
4 years ago
1 week