EssayTopicPredictV2  by Turing-Project

AI framework for predicting essay topics

Created 3 years ago
557 stars

Top 57.5% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an AI framework for predicting Chinese Gaokao (National College Entrance Examination) essay topics, targeting researchers and educators interested in NLP applications for educational assessment. It leverages unsupervised learning and pattern recognition to generate relevant and human-understandable essay prompts.

How It Works

The framework combines Harbin Institute of Technology's RoBerta-WWM-EXT model for language understanding, Bertopic for topic modeling, and GANs for generation. It utilizes a 1.7 billion parameter deep neural network trained on over 200 million data points, with a focus on whole-word masking for improved Chinese NLP. DBSCAN clustering is employed for topic identification, offering advantages over K-Means by handling arbitrary cluster shapes and densities without requiring a predefined cluster count.

Quick Start & Requirements

  • Install: cd train && python train.py
  • Prerequisites: Ubuntu 18.04.2/Windows 10 x86, Python 3.6+, TensorFlow-GPU 1.15.2, CUDA >= 10.0, CuDNN >= 7.6.0, OpenAI API key.
  • Setup: Requires downloading specific Chinese BERT models from provided Baidu Netdisk links.

Highlighted Details

  • Utilizes RoBerta-WWM-EXT (1.7B parameters) and Bertopic with DBSCAN.
  • Capable of end-to-end generation from paper recognition to answer sheet output.
  • Integrates GPT-4 for prompt guidance and essay optimization.
  • Trained on data from People's Daily, CCTV News, Weibo, and People.cn.

Maintenance & Community

The project was initiated in April 2022, with the first public release in May 2022. The README does not specify active maintenance or community channels.

Licensing & Compatibility

The project is explicitly stated as "for technical research and popular science only, not for any conclusive basis, and does not provide any commercial application authorization." No specific open-source license is mentioned.

Limitations & Caveats

The project is intended solely for research and educational purposes, with no commercial use authorization provided. The use of TensorFlow 1.15.2 and CUDA 10.0 indicates a potentially outdated technical stack.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.