fake-news-detector by CaptainYifei

Automated fake news detection system using AI and evidence search

Created 11 months ago

455 stars

Top 66.3% on SourcePulse

Project Summary

This project provides an automated fake news detection system leveraging AI and evidence search. It targets users needing to verify information accuracy by extracting claims, searching for supporting evidence online, and performing semantic analysis using large language and embedding models. The system offers a real-time, step-by-step verification process via a Streamlit web interface, aiding quick decision-making on news credibility.

How It Works

The system employs a multi-stage pipeline. It first uses a Large Language Model (LLM), such as Qwen2.5, to extract verifiable claims from input news text. Subsequently, it queries the DuckDuckGo search engine to gather relevant evidence. The BGE-M3 embedding model then calculates semantic similarity between the extracted claims and the retrieved evidence, identifying the most pertinent information. Finally, based on this evidence, the system provides a judgment on the news's veracity, detailing the reasoning process.

Quick Start & Requirements

Primary Install/Run: Clone the repository, install dependencies via pip install -r requirements.txt, and run the application using streamlit run app.py.
Prerequisites: Python 3.12 is required. Users must have access to a compatible LLM (e.g., local Qwen2.5-14B or an OpenAI-compatible API) and the BGE-M3 embedding model (which can be locally deployed or accessed via API). Model paths may need configuration in fact_checker.py.
Links: GitHub: https://github.com/CaptainYifei/fake-news-detector

Highlighted Details

Automated extraction of verifiable claims from news articles.
Real-time evidence gathering via DuckDuckGo search.
Advanced semantic relevance ranking using the BGE-M3 embedding model.
Streaming interface provides a transparent, step-by-step view of the fact-checking process.

Maintenance & Community

The project welcomes contributions via standard GitHub pull requests. Links to the GitHub repository are provided for issue tracking and code.

Licensing & Compatibility

The project is released under the MIT License, which generally permits broad use, modification, and distribution, including for commercial purposes, with minimal restrictions.

Limitations & Caveats

The system relies on the availability and quality of external search results and the accuracy of the configured LLM and embedding models. Local deployment of the Qwen2.5 and BGE-M3 models may require significant computational resources and specific hardware configurations. The README does not detail performance benchmarks or specific hardware requirements beyond Python version.

fake-news-detector by CaptainYifei

Explore Similar Projects

fuzi.mingcha by irlab-sdu

LLM-Factuality-Survey by wangcunxiang

ArkhamMirror by mantisfury

awesome-legal-nlp by maastrichtlawtech

exa-hallucination-detector by exa-labs

legal-ml-datasets by neelguha

Automated-Fact-Checking-Resources by Cartus

OpenFactVerification by Libr-AI

FActScore by shmsw25

LegalPapers by thunlp

factool by GAIR-NLP

legalbench by HazyResearch