gpt-2-cloud-run  by minimaxir

Text-generation API for scalable GPT-2 inference via Cloud Run

created 6 years ago
314 stars

Top 87.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a text-generation API using GPT-2, specifically designed for deployment on Google Cloud Run. It targets developers and researchers who want to easily integrate GPT-2's text generation capabilities into applications or provide a public-facing demo, offering a cost-effective and scalable solution.

How It Works

The API is built using starlette for asynchronous request handling. It packages the GPT-2 model (specifically the 117M parameter version due to Cloud Run's memory constraints) directly within the Docker container. This approach ensures stateless operation and efficient resource utilization on Cloud Run, allowing for automatic scaling and leveraging the platform's generous free tier for low-usage scenarios.

Quick Start & Requirements

  • Build Docker image: docker build . -t gpt2
  • Run locally: docker run -p 8080:8080 --memory="2g" --cpus="1" gpt2
  • Deploy to Cloud Run: Tag and push to GCR, then deploy via console, setting Memory to 2 GB and Max Requests Per Container to 1.
  • Prerequisites: Docker, Google Cloud SDK.
  • Demo: https://minimaxir.com/apps/gpt2-small/

Highlighted Details

  • Leverages Google Cloud Run for scalable, potentially free API hosting.
  • Bundles the GPT-2 117M model within the container for stateless operation.
  • API accepts GET and POST requests with parameters like length and temperature.
  • Prediction time is approximately 2 minutes per 1023 tokens.

Maintenance & Community

  • Maintained by Max Woolf (@minimaxir).
  • Creator's projects supported via Patreon.

Licensing & Compatibility

  • License: MIT
  • Compatible with commercial use. No affiliation with OpenAI.

Limitations & Caveats

  • Limited to the GPT-2 117M "small" model due to Cloud Run's 2GB memory limit.
  • Generation time can be slow (2 minutes per 1023 tokens).
  • Potential for container crashes due to memory leaks over time, though Cloud Run can recover.
Health Check
Last commit

4 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.