Discover and explore top open-source AI tools and projects—updated daily.
alibabaAI-powered resume parsing system
Top 85.0% on SourcePulse
An intelligent, layout-aware resume parsing system, SmartResume ingests resumes in PDF, image, and Office formats to extract clean text and reconstruct reading order. It leverages LLMs to convert this content into structured fields like basic info, education, and work experience, benefiting engineers and researchers by providing structured data for efficient analysis.
How It Works
SmartResume processes resumes by first extracting clean text using OCR and PDF metadata. It then reconstructs the correct reading order by employing layout detection. Finally, Large Language Models (LLMs) are utilized to convert this semantically ordered content into structured data fields. This layout-aware approach is advantageous for accurately interpreting resumes where visual formatting is critical to meaning.
Quick Start & Requirements
conda create -n resume_parsing python=3.9, conda activate resume_parsing), and install dependencies (pip install -e .).configs/config.yaml to add API keys.Highlighted Details
Maintenance & Community
The project includes a TODO list indicating ongoing development, such as optimizing model loading and enhancing vLLM deployment. No specific community channels (e.g., Discord, Slack) or notable contributors/sponsorships are detailed in the provided text.
Licensing & Compatibility
The project states it is licensed under "LICENSE," with plans to adopt more permissive licenses. However, the codebase is a refactored version due to open-source compliance requirements, and internal PDF parsing/OCR components were replaced with open-source alternatives. This suggests potential licensing ambiguities or restrictions that require further investigation for commercial use or closed-source integration.
Limitations & Caveats
This is a refactored version of the original system due to open-source compliance, with internal PDF parsing and OCR components replaced by open-source alternatives, potentially impacting compatibility with the original implementation. Some features may not be fully functional. Ongoing development is indicated by a TODO list, including optimizing model loading and enhancing vLLM deployment support.
2 months ago
Inactive
rednote-hilab
allenai