semikong  by aitomatic

Semiconductor LLM for domain-specific tasks

Created 1 year ago
360 stars

Top 77.7% on SourcePulse

GitHubView on GitHub
Project Summary

SEMIKONG is an open-source, industry-specific large language model (LLM) designed for the semiconductor manufacturing domain. It addresses the need for specialized AI capabilities in this complex field by providing models trained on a comprehensive corpus of semiconductor-related text, enabling better understanding of physics, chemistry, and processes. The project targets engineers, researchers, and companies in the semiconductor industry, offering a foundation for building proprietary AI solutions and improving productivity.

How It Works

SEMIKONG leverages the Transformer architecture and is based on the Llama model, allowing seamless integration with the existing Llama ecosystem. It utilizes a novel pre-training approach incorporating domain-specific knowledge to achieve superior performance on industry-relevant benchmarks compared to general-purpose LLMs. The project offers both 8B and 70B parameter instruct models, with weights available on Hugging Face.

Quick Start & Requirements

  • Installation: Clone the repository (git clone https://github.com/aitomatic/semikong.git), navigate into the directory (cd semikong), and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.10+, CUDA. For 4-bit quantized models, AWQ is required; for 8-bit, GPTQ is required.
  • Hardware: SEMIKONG-8B-Instruct requires a minimum of 16 GB VRAM (e.g., RTX 3060 12GB). SEMIKONG-70B-Instruct requires a minimum of 170 GB VRAM (e.g., 3 x A100 80GB). Fine-tuning 70B requires significant CPU memory (900GB+) and multiple high-VRAM GPUs.
  • Resources: Hugging Face Models, SemiKong Paper, Web Demo

Highlighted Details

  • First open-source, industry-specific LLM for semiconductor manufacturing.
  • SEMIKONG-70B-Chat model ranks first among open-source models on benchmarks like MMLU, CMMLU, BBH, and GSM8k.
  • Models are compatible with Llama ecosystem tools (e.g., LlamaForCausalLM, LlamaTokenizer).
  • Offers fine-tuning scripts and deployment guidance for various quantization methods (AWQ, GPTQ).

Maintenance & Community

The project is a collaborative effort involving Tokyo Electron, FPT Software AIC, and Aitomatic, with contributions from AI Alliance members. Discussions can be held on GitHub.

Licensing & Compatibility

The code and weights are distributed under the Apache 2.0 License, permitting personal, academic, and commercial use. Derivative works require attribution.

Limitations & Caveats

While efforts are made for data compliance, the model may still produce incorrect or problematic outputs due to data complexity and usage scenarios. The project disclaims responsibility for risks arising from misuse.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
5 more.

dbrx by databricks

0%
3k
Large language model for research/commercial use
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Simon Willison Simon Willison(Coauthor of Django), and
10 more.

Yi by 01-ai

0%
8k
Open-source bilingual LLMs trained from scratch
Created 1 year ago
Updated 9 months ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
25 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.