Awesome-Arabic-AI  by OmarSalah26

Arabic AI resource hub

Created 1 month ago
261 stars

Top 97.3% on SourcePulse

GitHubView on GitHub
Project Summary

Awesome-Arabic-AI serves as a curated, professional-grade repository centralizing open-source Artificial Intelligence resources for the Arabic language. It addresses the unique computational challenges of Arabic, such as complex morphology and diglossia, by democratizing access to state-of-the-art Large Language Models (LLMs), Text-to-Speech (TTS), Speech-to-Text (STT) models, and datasets. This project benefits developers, researchers, and power users by providing a strategic foundation and accessible tools for the burgeoning Arabic AI ecosystem, fostering optimization for the language's rich linguistic nuances.

How It Works

This repository functions as a comprehensive, community-driven collection and categorization of high-quality, open-weight Arabic AI models and research. It organizes resources across key areas including LLMs, TTS, STT, and specialized tools, with dedicated sections for dialectal AI and benchmarks. This approach facilitates discovery and adoption by providing a single point of reference for advancements in Arabic NLP and speech technologies, fostering a collaborative environment for optimizing models for the linguistic richness of the Arab world.

Quick Start & Requirements

This repository is a curated list, not a deployable application. Specific models and tools mentioned within will have their own installation and dependency requirements (e.g., Python, potential GPU/CUDA for LLMs and advanced speech models). Links to HuggingFace, official demos, or model repositories are provided for individual resources.

Highlighted Details

  • Features SOTA models like Falcon-H1-Arabic (LLM), AraModernBERT (LLM), Qari-OCR v0.3 (Vision/OCR), and ArTST v2 (STT).
  • Includes specialized dialectal resources for Egyptian (Masri) and Saudi Arabic, such as Chatterbox-Egyptian (TTS) and NAMAA-MT-Saudi (Translation).
  • Hosts key benchmarks like Silma TTS Benchmark and OALL (Open Arabic LLM Leaderboard) for evaluating Arabic AI performance.
  • Provides essential tools like Camel-tools for Arabic NLP tasks and a YouTube Audio Extractor for ASR dataset creation.

Maintenance & Community

The project actively encourages contributions via Pull Requests for new models and research. Key organizations and researchers involved include NAMAA Space, TII (Technology Innovation Institute), oddadmix, Ibrahim Salah, SWivid, and Silma AI. Community hubs and specific dialectal resources are highlighted, fostering collaboration.

Licensing & Compatibility

The repository itself is distributed under the Apache 2.0 License. Individual models and tools listed within will carry their own licenses, which users must verify for compatibility, especially for commercial use.

Limitations & Caveats

As a curated list, this repository does not provide a unified API or deployment framework. Users must individually assess and integrate each listed resource. The rapid evolution of AI means resources may become outdated, requiring continuous community updates. Specific hardware or software dependencies (e.g., GPU, CUDA) are model-dependent and not universally specified.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
265 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.