EduChat  by ECNU-ICALK

Open-source educational chat model

Created 2 years ago
848 stars

Top 42.1% on SourcePulse

GitHubView on GitHub
Project Summary

EduChat is an open-source educational chatbot developed by East China Normal University, targeting students, teachers, and parents. It aims to provide intelligent educational support, including automated question generation, homework grading, emotional support, and course tutoring, by leveraging large language models fine-tuned on diverse educational data.

How It Works

EduChat builds upon foundational models like LLaMA and Baichuan, fine-tuning them with a custom dataset named educhat-sft-002-data-osm, which comprises over 4 million deduplicated Chinese and English instruction and dialogue examples. The project also offers CleanTool, a data cleaning utility, to enhance data quality. The models are designed for educational scenarios, with specific system prompts enabling functionalities like open-ended Q&A, emotional support, essay grading, and heuristic teaching.

Quick Start & Requirements

  • Install dependencies using pip install transformers after installing PyTorch.
  • Requires Python 3.8+ and a GPU (e.g., A100/A800) for optimal performance, with FP16 precision using ~15GB VRAM.
  • Local deployment examples are provided via Python scripts for Gradio demos and API services.
  • Official demo: https://www.educhat.top/ (internal testing), https://educhat.xiaoi.com/ (public testing).

Highlighted Details

  • Offers multiple model sizes (1.8B, 7B, 13B, 14B, 32B) based on different architectures (Baichuan, Qwen1.5).
  • Includes a data cleaning tool (CleanTool) for improving dataset quality.
  • Provides example Python code for direct model inference and Gradio/API demos for interactive use.
  • Research paper available: https://arxiv.org/abs/2308.02773

Maintenance & Community

  • Developed by the EduNLP team at East China Normal University.
  • Acknowledges contributions from LLaMA, Baichuan, and Open Assistant.
  • Future plans include enhancing logical reasoning, personalization, and tool-calling capabilities.

Licensing & Compatibility

  • Code licensed under Apache 2.0.
  • Data licensed under CC BY-NC 4.0.
  • Restrictions: Prohibits commercial use and any use that may cause societal harm.

Limitations & Caveats

The model may generate factually incorrect or biased responses, and its capabilities in reasoning, coding, and multi-turn dialogue require further improvement. The project is intended for research purposes only.

Health Check
Last Commit

2 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
1
Star History
14 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.