jobs  by karpathy

Visualizing US job market data with LLM-driven insights

Created 4 weeks ago

New!

1,426 stars

Top 28.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This repository provides a research tool for visually exploring US Bureau of Labor Statistics (BLS) Occupational Outlook Handbook data. It enables developers and researchers to interactively visualize job market trends, including novel metrics like AI exposure, by processing detailed occupation data and leveraging LLM-powered analysis. The project aims to offer a flexible development tool for exploring BLS data visually, rather than a formal economic publication.

How It Works

The project employs a multi-stage data pipeline: scraping raw BLS HTML, parsing it into clean Markdown, and tabulating structured statistics (pay, education, job count, growth) into occupations.csv. A core innovation is the score.py script, which utilizes LLMs (Gemini Flash via OpenRouter) to assign custom scores and rationales to each occupation based on user-defined prompts, such as estimating "Digital AI Exposure." This LLM-generated data is then merged with BLS statistics to power an interactive treemap visualization, allowing dynamic exploration of various job market facets.

Quick Start & Requirements

  • Installation: Use uv sync for dependencies and uv run playwright install chromium for browser drivers.
  • Prerequisites: Requires an OpenRouter API key set in a .env file (OPENROUTER_API_KEY=your_key_here).
  • Usage: Commands are provided for scraping (scrape.py), processing (process.py), CSV generation (make_csv.py), LLM scoring (score.py), building site data (build_site_data.py), and serving the site locally (cd site && python -m http.server 8000).
  • Demo: A live demo is available at karpathy.ai/jobs.

Highlighted Details

  • Interactive treemap visualization covering 342 US occupations.
  • LLM-powered scoring enables custom analysis layers (e.g., AI exposure, robotics, offshoring risk) beyond standard BLS metrics.
  • Data includes job duties, education, pay, growth projections, and LLM-generated rationales.
  • Comprehensive pipeline from BLS HTML scraping to a static, interactive website.

Maintenance & Community

No specific details on maintenance, contributors, or community channels were found in the provided README.

Licensing & Compatibility

The README does not specify a software license. This omission requires clarification for any adoption decision, particularly regarding commercial use or derivative works.

Limitations & Caveats

  • AI exposure scores are described as rough LLM estimates, not rigorous predictions, and do not account for factors like demand elasticity or regulatory barriers.
  • Scores indicate job reshaping rather than predicting job disappearance.
  • The scraping process requires Playwright in non-headless mode due to BLS bot detection.
Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
12
Issues (30d)
10
Star History
1,444 stars in the last 28 days

Explore Similar Projects

Feedback? Help us improve.