Twitter-Insight-LLM by AlexZhangji

Tool for Twitter data scraping, analysis, and image captioning

Created 1 year ago

681 stars

Top 49.9% on SourcePulse

Project Summary

This project provides tools for scraping Twitter liked tweets, saving them to JSON and Excel, performing initial data analysis, and includes an experimental embedding-based image search feature. It targets users interested in analyzing Twitter data and leveraging LLMs for insights, offering a no-GPU solution for image search.

How It Works

The project utilizes Selenium for Twitter data scraping, extracting liked tweets and associated metadata. Data is then processed and saved into JSON and Excel formats. An experimental feature employs embedding models for natural language image search, designed to run without GPU acceleration, making it accessible for broader use. Image captioning is powered by the OpenAI API.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Obtain a Twitter authentication token (auth_key cookie).
Optionally, obtain an OpenAI API key for image captioning.
Configure keys in config.py.
Run data ingestion: python twitter_data_ingestion.py
Run image search webapp: streamlit run image_search_webapp.py
Prerequisites: Python, Selenium, OpenAI API (optional).
Setup time: Minimal, dependent on obtaining auth token.
Links: Demo Video, FAQ

Highlighted Details

Experimental embedding-based image search works without GPU support.
Supports natural language search for images, including abstract concepts.
Data can be exported to both JSON and Excel formats.
Includes initial data analysis notebooks with visualizations like like trends and calendar heatmaps.

Maintenance & Community

Contributions are welcome via issues and pull requests. The project acknowledges inspiration from Twitter-Scrapper and uses the OpenAI API for image captioning.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project relies on Selenium for scraping, which carries a risk of account suspension; using a spare account is recommended. The image search functionality, while GPU-free, may yield better results in English than other languages. The project is presented as part of initial steps for a larger personal project, implying ongoing development and potential for future changes.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days