sec-parser  by alphanome-ai

Parse SEC EDGAR HTML into structured semantic trees

Created 2 years ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

sec-parser transforms SEC EDGAR HTML filings into structured semantic elements and a tree representation. It targets AI/ML/LLM practitioners and researchers, streamlining complex data pre-processing for advanced financial analysis and information extraction.

How It Works

The library parses HTML documents into a hierarchy of semantic elements (titles, paragraphs, tables), analogous to image semantic segmentation. This approach creates a tree structure mirroring the document's visual and informational layout, facilitating easier data manipulation and targeted extraction.

Quick Start & Requirements

Installation is straightforward via pip: pip install sec-parser. For fetching filings, sec-downloader is also required (pip install sec-downloader). GitHub Codespaces offers a pre-configured environment for immediate experimentation. Official documentation and a demo are available.

Highlighted Details

  • AI/ML/LLM Integration: Designed for AI applications, enabling tasks like text summarization, sentiment analysis, and LLM-compatible data preparation.
  • Causal AI Support: Facilitates causal analysis of financial data beyond mere correlations.
  • Flexible Filtering: Allows precise extraction of specific document sections and types.
  • Advanced Techniques: Integrates with concepts like MemWalker for efficient information extraction from complex filings.

Maintenance & Community

Community engagement is encouraged via Discord, GitHub Discussions, and Issues for support and bug reporting. A roadmap and contribution guide are available.

Licensing & Compatibility

The project is released under the permissive MIT License, allowing for broad compatibility with commercial and closed-source applications.

Limitations & Caveats

This tool is an independent, open-source initiative with no affiliation or endorsement from the SEC. It is not intended for financial advisement or regulatory compliance. Users assume all risk, as creators provide no warranties regarding data accuracy or completeness, and disclaim liability for any financial or legal consequences.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Sasha Rush Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech), and
2 more.

llmparser by kyang6

0%
428
LLM tool for structured data extraction and classification
Created 2 years ago
Updated 2 years ago
Feedback? Help us improve.