gpt-2-cloud-run by minimaxir

Text-generation API for scalable GPT-2 inference via Cloud Run

Created 6 years ago

313 stars

Top 86.3% on SourcePulse

2 Experts Love This Project

bryanhelmig

Cofounder of Zapier

simonw

Coauthor of Django

Project Summary

This project provides a text-generation API using GPT-2, specifically designed for deployment on Google Cloud Run. It targets developers and researchers who want to easily integrate GPT-2's text generation capabilities into applications or provide a public-facing demo, offering a cost-effective and scalable solution.

How It Works

The API is built using starlette for asynchronous request handling. It packages the GPT-2 model (specifically the 117M parameter version due to Cloud Run's memory constraints) directly within the Docker container. This approach ensures stateless operation and efficient resource utilization on Cloud Run, allowing for automatic scaling and leveraging the platform's generous free tier for low-usage scenarios.

Quick Start & Requirements

Build Docker image: docker build . -t gpt2
Run locally: docker run -p 8080:8080 --memory="2g" --cpus="1" gpt2
Deploy to Cloud Run: Tag and push to GCR, then deploy via console, setting Memory to 2 GB and Max Requests Per Container to 1.
Prerequisites: Docker, Google Cloud SDK.
Demo: https://minimaxir.com/apps/gpt2-small/

Highlighted Details

Leverages Google Cloud Run for scalable, potentially free API hosting.
Bundles the GPT-2 117M model within the container for stateless operation.
API accepts GET and POST requests with parameters like length and temperature.
Prediction time is approximately 2 minutes per 1023 tokens.

Maintenance & Community

Maintained by Max Woolf (@minimaxir).
Creator's projects supported via Patreon.

Licensing & Compatibility

License: MIT
Compatible with commercial use. No affiliation with OpenAI.

Limitations & Caveats

Limited to the GPT-2 117M "small" model due to Cloud Run's 2GB memory limit.
Generation time can be slow (2 minutes per 1023 tokens).
Potential for container crashes due to memory leaks over time, though Cloud Run can recover.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

gpt-go by hanyuancheung

Go SDK for interacting with OpenAI's GPT models

Created 2 years ago

Updated 2 years ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

xsai by moeru-ai

Extra-small AI SDK for OpenAI or compatible APIs

Created 1 year ago

Updated 5 days ago

OpenAIKit by OpenDive

Swift SDK for OpenAI API

Created 3 years ago

Updated 1 year ago

Starred by

Teknium

Teknium(Cofounder of Nous Research) and

Wing Lian

Wing Lian(Founder of Axolotl AI).

gpt-llama.cpp by keldenl

API wrapper for local LLM inference, emulating OpenAI's GPT endpoints

Created 2 years ago

Updated 2 years ago

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier).

gpt2client by rish-16

TensorFlow wrapper for GPT-2 text generation models

Created 6 years ago

Updated 4 years ago

PromptAppGPT by mleoking

Low-code framework for rapid prompt-based app development using GPT

Created 2 years ago

Updated 2 years ago

libra by nextify-limited

AI-native platform for building web apps with natural language

Created 5 months ago

Updated 3 months ago

GenAIExamples by opea-project

GenAI examples for simplifying GenAI application deployment

Created 1 year ago

Updated 2 days ago

azure-openai-samples by Azure

Code samples for Azure OpenAI GPT-3.5 use

Created 2 years ago

Updated 8 months ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Vaibhav Nivargi

Vaibhav Nivargi(Cofounder of Moveworks), and

7 more.

gpt-2-simple by minimaxir

Python package for GPT-2 text generation model fine-tuning

Created 6 years ago

Updated 3 years ago

ChatGPT-Telegram-Workers by TBXark

Telegram bot on Cloudflare Workers

Created 2 years ago

Updated 6 days ago

Starred by

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

2 more.

gpt4free by xtekky

API package for multi-provider LLM requests (GPT-4.1, Gemini 2.5, Deepseek R1)

Created 2 years ago

Updated 14 hours ago

Feedback? Help us improve.