mad-professor-public  by LYiHub

AI companion for reading papers with a "grumpy professor" persona

Created 10 months ago
1,582 stars

Top 26.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an AI-powered desktop application designed to enhance academic paper reading efficiency for researchers. It offers features like PDF processing, AI translation, RAG-based Q&A, and voice interaction, all delivered through a unique "mad professor" persona for a more engaging experience.

How It Works

The application employs a multi-stage pipeline: PDF ingestion and parsing (via magic-pdf), content translation, structuring, and embedding for RAG. A PyQt6 frontend provides a split-pane interface for viewing documents and interacting with the AI. The core AI functionality leverages LLMs for Q&A and includes speech recognition (Whisper) and TTS for voice interaction, with RAG enhancing retrieval accuracy.

Quick Start & Requirements

  • Install: Use conda to create an environment, then pip install dependencies.
  • Prerequisites: Python 3.10+, CUDA support with >6GB VRAM, faiss-gpu (via conda), numpy<=2.1.1.
  • API Keys: Requires API keys for DeepSeek and MiniMax for LLM and TTS services, configurable in config.py.
  • Models: A download_models.py script handles model downloads.
  • Docs: MinerU, RealtimeSTT, DeepSeek, MiniMax.

Highlighted Details

  • Features a "mad professor" persona with customizable prompts and voice.
  • Supports bilingual (Chinese/English) paper viewing with side-by-side display.
  • Integrates RAG for precise retrieval and context-aware AI questioning.
  • Offers voice input and TTS output for interactive Q&A.

Maintenance & Community

The project explicitly thanks the MinerU and RealtimeSTT projects. Customization of persona and voice requires manual code modification.

Licensing & Compatibility

Licensed under the Apache License. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The application is primarily designed for structured academic PDFs; unstructured documents may cause errors. Concurrent audio input device switching and AI voice feedback loops (if not using headphones) are noted issues.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.