WriteGPT  by Turing-Project

AI framework for essay generation, focused on Chinese high school compositions

Created 5 years ago
5,323 stars

Top 9.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

WriteGPT is a framework for AI-powered essay generation, targeting students and researchers interested in advanced NLP applications. It aims to produce human-cognizable essays, with initial fine-tuning focused on high school exam essays, achieving a passing score for many students.

How It Works

The framework employs a modular, pipeline architecture integrating EAST and CRNN for OCR, BERT for text summarization, and GPT-2 for text generation. This multi-stage approach allows for specialized training of each component, from image-based text detection and recognition to sophisticated language understanding and generation, culminating in a DNN-based scoring mechanism and custom formatting scripts.

Quick Start & Requirements

  • Install: Not explicitly detailed, but requires a local environment setup.
  • Prerequisites: Ubuntu 18.04.2, Python 3.x, Pandas, Regex, h5py, Numpy, Tensorboard, Tensorflow-gpu 1.15.2, Requests, OpenCV 3.4.2, CUDA >= 10.0, CuDNN >= 7.6.0.
  • Resources: Training involves a 1.5 billion parameter GPT-2 model, requiring significant GPU resources (e.g., Quadro RTX 8000). A Colab demo is available for text generation.
  • Links: Online Demo

Highlighted Details

  • 1.7 billion parameter neural network with over 200 million pre-trained data points.
  • End-to-end generation pipeline from paper recognition to answer sheet output.
  • Fine-tuning corpora include works by Mao Zedong, Chen Duxiu, and Lu Xun.
  • Output can be directed to a CNC robot for handwriting.

Maintenance & Community

  • Project initiated in June 2020.
  • Key contributors include Y1ran.
  • Community links (Discord/Slack) are not provided.

Licensing & Compatibility

  • The project states it is for "technical research and popular science only" and does not provide commercial application authorization. Specific license details are not provided, but some components are noted as open-source.

Limitations & Caveats

The project acknowledges that generated essays may not perfectly match exam formatting requirements and that a significant portion of generated essays are not of passing quality. Some core pipeline files are intentionally hidden due to concerns about misuse. The handwriting output device has known issues with occasional missed characters or out-of-bounds writing.

Health Check
Last Commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.