AI-Investigator  by muratcankoylan

Python framework for automated website content analysis and structured report generation

Created 10 months ago
658 stars

Top 50.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This Python framework automates the analysis of enterprise AI case studies from websites or provided URLs. It leverages Claude 3.5 Sonnet for intelligent identification and analysis of AI case studies, and Firecrawl for efficient web scraping and content extraction, producing detailed individual, cross-case, and executive reports.

How It Works

The system employs a two-pronged approach: CSV mode for specific URLs and Website mode for broader discovery. In Website mode, Firecrawl's /v1/map endpoint discovers links, followed by /v1/scrape to extract markdown content and metadata. Claude 3.5 Sonnet then identifies relevant case studies, filters for enterprise AI relevance, and performs in-depth analysis of strategy, implementation, and business impact.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Set environment variables: ANTHROPIC_API_KEY and FIRECRAWL_API_KEY.
  • Run analyzer: python -m src.main
  • Requires Python 3.x, Anthropic API key (Claude 3.5 Sonnet), and Firecrawl API key.

Highlighted Details

  • Analyzes case studies from CSV files or by discovering them on company websites.
  • Generates three report types: Individual Case Study Reports, Cross-Case Analysis, and an Executive Dashboard.
  • Utilizes Firecrawl's map and scrape endpoints for link discovery and content extraction.
  • Employs Claude 3.5 Sonnet for intelligent case study identification and detailed content analysis.

Maintenance & Community

Contributions are welcome. The project is MIT licensed.

Licensing & Compatibility

MIT License. Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The system relies heavily on the quality of the Claude 3.5 Sonnet API and Firecrawl's scraping capabilities. Performance and accuracy may vary based on website structure and content. API keys are required for core functionality.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), Jeremy Howard Jeremy Howard(Cofounder of fast.ai), and
2 more.

trafilatura by adbar

0.5%
5k
Python package for web text extraction
Created 6 years ago
Updated 6 days ago
Feedback? Help us improve.