Discover and explore top open-source AI tools and projects—updated daily.
alphanome-aiParse SEC EDGAR HTML into structured semantic trees
Top 100.0% on SourcePulse
Summary
sec-parser transforms SEC EDGAR HTML filings into structured semantic elements and a tree representation. It targets AI/ML/LLM practitioners and researchers, streamlining complex data pre-processing for advanced financial analysis and information extraction.
How It Works
The library parses HTML documents into a hierarchy of semantic elements (titles, paragraphs, tables), analogous to image semantic segmentation. This approach creates a tree structure mirroring the document's visual and informational layout, facilitating easier data manipulation and targeted extraction.
Quick Start & Requirements
Installation is straightforward via pip: pip install sec-parser. For fetching filings, sec-downloader is also required (pip install sec-downloader). GitHub Codespaces offers a pre-configured environment for immediate experimentation. Official documentation and a demo are available.
Highlighted Details
Maintenance & Community
Community engagement is encouraged via Discord, GitHub Discussions, and Issues for support and bug reporting. A roadmap and contribution guide are available.
Licensing & Compatibility
The project is released under the permissive MIT License, allowing for broad compatibility with commercial and closed-source applications.
Limitations & Caveats
This tool is an independent, open-source initiative with no affiliation or endorsement from the SEC. It is not intended for financial advisement or regulatory compliance. Users assume all risk, as creators provide no warranties regarding data accuracy or completeness, and disclaim liability for any financial or legal consequences.
7 months ago
Inactive
kyang6
dgunning