pattern  by clips

Python web mining module

created 14 years ago
8,828 stars

Top 5.9% on sourcepulse

GitHubView on GitHub
Project Summary

Pattern is a comprehensive Python module for web mining, offering integrated tools for data mining, natural language processing, machine learning, and network analysis. It targets developers and researchers needing to extract, process, and analyze data from the web, providing a unified toolkit for complex data-driven tasks.

How It Works

Pattern employs a modular design, integrating various algorithms and data structures for its diverse functionalities. For NLP, it includes a Brill tagger and sentiment analysis. Machine learning is supported by vector space models and classifiers like KNN and SVM. Network analysis leverages graph centrality algorithms and visualization tools. This integrated approach simplifies complex workflows by providing a single, well-documented library for multiple data science tasks.

Quick Start & Requirements

  • Install via pip: pip install pattern
  • Supports Python 2.7 and Python 3.6.
  • Bundled with dependencies like Brill tagger, LIBSVM, LIBLINEAR, and NetworkX.
  • Official documentation and examples are available.

Highlighted Details

  • Offers tools for web scraping, HTML DOM parsing, and web service integration (Google, Twitter, Wikipedia).
  • Includes advanced NLP capabilities such as part-of-speech tagging, sentiment analysis, and WordNet integration.
  • Provides machine learning algorithms including vector space models, clustering, and classification (KNN, SVM, Perceptron).
  • Features network analysis tools for graph centrality and visualization.

Maintenance & Community

The project is hosted on GitHub and welcomes contributions and donations. Key contributors include Tom De Smedt and Walter Daelemans.

Licensing & Compatibility

Licensed under BSD, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

The project officially supports Python 2.7 and 3.6, with no explicit mention of compatibility with newer Python versions.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
31 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Andre Zayarni Andre Zayarni(Cofounder of Qdrant), and
1 more.

refinery by code-kern-ai

0.1%
1k
Open-source tool for NLP data scaling, assessment, and maintenance
created 3 years ago
updated 7 months ago
Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
4 more.

awesome-nlp by keon

0.1%
17k
Curated list of NLP resources
created 9 years ago
updated 1 year ago
Feedback? Help us improve.