linkedinscraper  by cwwmbm

Job scraper for LinkedIn, storing results in SQLite

created 2 years ago
301 stars

Top 89.6% on sourcepulse

GitHubView on GitHub
Project Summary

This Python application addresses the frustration of sifting through irrelevant job postings on LinkedIn by scraping, filtering, and storing job data locally. It targets job seekers looking for a more efficient and personalized way to manage their job search, offering features like keyword filtering, duplicate removal, and a web interface for tracking application status.

How It Works

The project utilizes Python libraries like Requests and BeautifulSoup to scrape job postings from LinkedIn based on user-defined search queries and filters specified in a config.json file. It then processes these postings to remove duplicates and filter out irrelevant jobs based on keywords in titles and descriptions. The cleaned data is stored in a SQLite database. A Flask-based web interface allows users to view, sort, and update the status of job postings (applied, rejected, interview, hidden).

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Run scraper: python main.py
  • Run web interface: python app.py
  • Prerequisites: Python 3.6+, Flask, Requests, BeautifulSoup, Pandas, SQLite3, Pysocks.
  • Configuration: Requires a config.json file with proxy settings, headers, OpenAI API key (for cover letter generation), resume path, and search queries.
  • Official Docs: [Not explicitly linked, but configuration details are in README]

Highlighted Details

  • OpenAI integration for cover letter generation using a provided resume.
  • Web interface for tracking job application status with color-coded indicators.
  • Advanced filtering options for job titles, descriptions, and company names.
  • Duplicate and sponsored job post removal.

Maintenance & Community

  • The README was last updated in August 2023.
  • Contribution guidelines are provided, encouraging pull requests and issue discussion for major changes.
  • No specific community links (Discord/Slack) or notable contributors are mentioned.

Licensing & Compatibility

  • Licensed under the MIT License.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

  • LinkedIn's terms of service prohibit scraping; use is at the user's own risk, and proxy servers are recommended.
  • Functionality to reverse status updates (unhide, un-apply) is missing; manual database edits are required.
  • Some jobs may not be picked up immediately after posting due to LinkedIn's indexing.
  • Front-end configuration and search execution are not yet implemented.
Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
36 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.