Curated list of domain-specific LLMs, datasets, and benchmarks
Top 19.0% on sourcepulse
This repository serves as a curated collection of open-source Large Language Models (LLMs), datasets, and evaluation benchmarks specifically tailored for various vertical domains. It aims to facilitate research and application development by providing a centralized resource for domain-specific LLM advancements, targeting researchers, developers, and practitioners in specialized fields.
How It Works
The project categorizes and lists LLMs that have been fine-tuned or continuously pre-trained on domain-specific data, building upon general-purpose foundation models like LLaMA, ChatGLM, and Qwen. It meticulously organizes these models by domain (e.g., Medical, Legal, Financial, Education, Cybersecurity, DevOps, etc.), providing links to their respective GitHub repositories, and often including associated papers and star counts as indicators of community engagement and research impact.
Quick Start & Requirements
This repository is a curated list, not a runnable software package. To use any of the listed models, users must refer to the individual project links provided for installation, dependencies (e.g., specific Python versions, CUDA, hardware requirements), and usage instructions.
Highlighted Details
Maintenance & Community
The project is actively maintained, with frequent updates reflecting the rapid pace of LLM development in specialized domains. Community contributions are welcomed to expand the collection.
Licensing & Compatibility
Licensing varies by the individual projects listed. Users must consult the specific repository for each model or dataset to understand its licensing terms and compatibility for commercial or closed-source use.
Limitations & Caveats
This repository is a directory and does not provide direct access to or hosting for the listed models or datasets. Users are responsible for evaluating the quality, licensing, and suitability of each resource independently.
1 year ago
1 week