Twitter-Insight-LLM  by AlexZhangji

Tool for Twitter data scraping, analysis, and image captioning

created 1 year ago
677 stars

Top 51.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides tools for scraping Twitter liked tweets, saving them to JSON and Excel, performing initial data analysis, and includes an experimental embedding-based image search feature. It targets users interested in analyzing Twitter data and leveraging LLMs for insights, offering a no-GPU solution for image search.

How It Works

The project utilizes Selenium for Twitter data scraping, extracting liked tweets and associated metadata. Data is then processed and saved into JSON and Excel formats. An experimental feature employs embedding models for natural language image search, designed to run without GPU acceleration, making it accessible for broader use. Image captioning is powered by the OpenAI API.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Obtain a Twitter authentication token (auth_key cookie).
  • Optionally, obtain an OpenAI API key for image captioning.
  • Configure keys in config.py.
  • Run data ingestion: python twitter_data_ingestion.py
  • Run image search webapp: streamlit run image_search_webapp.py
  • Prerequisites: Python, Selenium, OpenAI API (optional).
  • Setup time: Minimal, dependent on obtaining auth token.
  • Links: Demo Video, FAQ

Highlighted Details

  • Experimental embedding-based image search works without GPU support.
  • Supports natural language search for images, including abstract concepts.
  • Data can be exported to both JSON and Excel formats.
  • Includes initial data analysis notebooks with visualizations like like trends and calendar heatmaps.

Maintenance & Community

Contributions are welcome via issues and pull requests. The project acknowledges inspiration from Twitter-Scrapper and uses the OpenAI API for image captioning.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project relies on Selenium for scraping, which carries a risk of account suspension; using a spare account is recommended. The image search functionality, while GPU-free, may yield better results in English than other languages. The project is presented as part of initial steps for a larger personal project, implying ongoing development and potential for future changes.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

chatgpt-pgvector by gannonh

0%
938
Domain-specific chat completions app
created 2 years ago
updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.3%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 4 hours ago
Feedback? Help us improve.