Tool for Twitter data scraping, analysis, and image captioning
Top 51.2% on sourcepulse
This project provides tools for scraping Twitter liked tweets, saving them to JSON and Excel, performing initial data analysis, and includes an experimental embedding-based image search feature. It targets users interested in analyzing Twitter data and leveraging LLMs for insights, offering a no-GPU solution for image search.
How It Works
The project utilizes Selenium for Twitter data scraping, extracting liked tweets and associated metadata. Data is then processed and saved into JSON and Excel formats. An experimental feature employs embedding models for natural language image search, designed to run without GPU acceleration, making it accessible for broader use. Image captioning is powered by the OpenAI API.
Quick Start & Requirements
pip install -r requirements.txt
config.py
.python twitter_data_ingestion.py
streamlit run image_search_webapp.py
Highlighted Details
Maintenance & Community
Contributions are welcome via issues and pull requests. The project acknowledges inspiration from Twitter-Scrapper and uses the OpenAI API for image captioning.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project relies on Selenium for scraping, which carries a risk of account suspension; using a spare account is recommended. The image search functionality, while GPU-free, may yield better results in English than other languages. The project is presented as part of initial steps for a larger personal project, implying ongoing development and potential for future changes.
1 year ago
Inactive