Awesome-Swiss-German  by esthicodes

Analyze Swiss German text with NLU capabilities

Created 3 years ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides tools and resources for Natural Language Processing (NLP) specifically focused on Swiss German dialects. It aims to enable developers to apply NLU features like sentiment analysis, entity analysis, and content classification to applications dealing with Swiss German text and speech. The project is relevant for researchers, developers, and anyone interested in processing or understanding Swiss German in a computational context.

How It Works

The project leverages several NLP techniques and tools. It mentions the use of ANTLR (ANother Tool for Language Recognition) for parsing and language processing, and discusses approaches like backpropagation and log-linear modeling for probabilistic NLP. For speech processing, it references Google Cloud Speech-to-Text and DeepSpeech for Automatic Speech Recognition (ASR) of Swiss German. The repository also includes Python scripts for random walk simulations on graphs, with a CUDA-enabled version for GPU acceleration, and a MeetingTimeEstimator for predicting meeting times of walks.

Quick Start & Requirements

To install the core Python package, use: pip install structural_diversity_index==0.0.3

For GPU support, a Conda environment is recommended. Download the environment.yml file from GitHub and run: conda env create -f environment.yml This creates an environment named sd_index with necessary dependencies, including CUDA support.

A Jupyter notebook (Example.ipynb) is available for a detailed tutorial. Pre-processing documentation is also provided.

Highlighted Details

  • Supports analysis of 26 cantonal Swiss German dialects, alongside Italian, German, Chinese, and French.
  • Includes tools for sentiment analysis, entity analysis, content classification, and syntax analysis.
  • Features GPU-accelerated random walk simulations (RandomWalkSimulatorCUDA).
  • Mentions integration with Google Cloud services like Cloud Run, BigQuery, and Machine Learning APIs.

Maintenance & Community

The primary contact is hoeuyu@ethz.ch. The repository is hosted on GitHub. Links to personal blogs, LinkedIn, and Instagram are provided for contact.

Licensing & Compatibility

The repository's licensing is not explicitly stated in the provided README. However, the mention of "MIT" and "GPL" in the context of other tools suggests a potential mix or a need for clarification. Compatibility for commercial use is not detailed.

Limitations & Caveats

The README indicates that Siri may prioritize its default phrase handling over custom device integrations, which could be a limitation for voice control applications. The project appears to be a collection of diverse NLP tools, and the integration or unified purpose across all components might require further investigation. Some parts, like the "App Demo VERSION," seem to be in an early stage.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Eugene Yan Eugene Yan(AI Scientist at AWS), and
14 more.

text by pytorch

0.0%
4k
PyTorch library for NLP tasks
Created 8 years ago
Updated 1 week ago
Feedback? Help us improve.