WriteGPT  by Turing-Project

AI framework for essay generation, focused on Chinese high school compositions

created 4 years ago
5,325 stars

Top 9.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

WriteGPT is a framework for AI-powered essay generation, targeting students and researchers interested in advanced NLP applications. It aims to produce human-cognizable essays, with initial fine-tuning focused on high school exam essays, achieving a passing score for many students.

How It Works

The framework employs a modular, pipeline architecture integrating EAST and CRNN for OCR, BERT for text summarization, and GPT-2 for text generation. This multi-stage approach allows for specialized training of each component, from image-based text detection and recognition to sophisticated language understanding and generation, culminating in a DNN-based scoring mechanism and custom formatting scripts.

Quick Start & Requirements

  • Install: Not explicitly detailed, but requires a local environment setup.
  • Prerequisites: Ubuntu 18.04.2, Python 3.x, Pandas, Regex, h5py, Numpy, Tensorboard, Tensorflow-gpu 1.15.2, Requests, OpenCV 3.4.2, CUDA >= 10.0, CuDNN >= 7.6.0.
  • Resources: Training involves a 1.5 billion parameter GPT-2 model, requiring significant GPU resources (e.g., Quadro RTX 8000). A Colab demo is available for text generation.
  • Links: Online Demo

Highlighted Details

  • 1.7 billion parameter neural network with over 200 million pre-trained data points.
  • End-to-end generation pipeline from paper recognition to answer sheet output.
  • Fine-tuning corpora include works by Mao Zedong, Chen Duxiu, and Lu Xun.
  • Output can be directed to a CNC robot for handwriting.

Maintenance & Community

  • Project initiated in June 2020.
  • Key contributors include Y1ran.
  • Community links (Discord/Slack) are not provided.

Licensing & Compatibility

  • The project states it is for "technical research and popular science only" and does not provide commercial application authorization. Specific license details are not provided, but some components are noted as open-source.

Limitations & Caveats

The project acknowledges that generated essays may not perfectly match exam formatting requirements and that a significant portion of generated essays are not of passing quality. Some core pipeline files are intentionally hidden due to concerns about misuse. The handwriting output device has known issues with occasional missed characters or out-of-bounds writing.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

mlx-gpt2 by pranavjad

0.5%
393
Minimal GPT-2 implementation for educational purposes
created 1 year ago
updated 1 year ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

pytorch-nlp-notebooks by scoutbee

0%
419
PyTorch tutorials for NLP tasks
created 6 years ago
updated 5 years ago
Feedback? Help us improve.