Skywork  by SkyworkAI

LLM for multilingual tasks, creative writing, math, and multimodal applications

created 1 year ago
1,411 stars

Top 29.4% on sourcepulse

GitHubView on GitHub
Project Summary

The Skywork project provides a series of 13B parameter large language models (LLMs) trained on 3.2TB of multilingual and code data, aiming to offer strong performance across general tasks, creative writing, and mathematical reasoning. It targets researchers and developers seeking high-quality, open-source bilingual (Chinese/English) models with commercial use potential.

How It Works

Skywork models are built on a thinner, deeper architecture (52 layers) compared to Llama-2-13B, with a larger vocabulary size (65,536) achieved via BPE tokenization. Training involves a two-stage process: initial pre-training on general corpora, followed by a second stage incorporating STEM data to boost reasoning and mathematical abilities. The project also releases Skypile-150B, a 600GB Chinese dataset, and offers quantized versions for consumer GPU deployment.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt
  • Prerequisites: Python 3.8+, PyTorch 2.0+, CUDA 11.4+ recommended.
  • Resources: Quantized versions support consumer GPUs. Full model details and inference examples are provided in the README.
  • Links: Hugging Face, ModelScope, Tech Report

Highlighted Details

  • Skywork-13B-Base achieves top performance among 13B models on benchmarks like C-Eval (60.6), CMMLU (61.8), MMLU (62.1), and GSM8K (55.8).
  • Skywork-13B-Math ranks first in GSM8K and CMATH benchmarks for its scale.
  • Skywork-13B-Chat is fine-tuned for creative writing tasks, showing ChatGPT-like results.
  • Skywork-13B-MM is a multimodal model for image-based Q&A.
  • Offers 8-bit quantized models with minimal performance degradation and reduced GPU memory usage (13.57GB vs 25.91GB for bf16).

Maintenance & Community

The project is developed by the Kunlun Group · Skywork team. Integration with Huawei's MindFormers suite on Ascend hardware is available.

Licensing & Compatibility

The models are available under the "Skywork Community License" and support commercial use, provided terms are followed. Usage is restricted from activities threatening national/social security or unlawful actions.

Limitations & Caveats

The SkyPile-150B dataset, while filtered, may still contain sensitive information. The project disclaims responsibility for risks arising from model misuse or unforeseen issues. Some model variants (Chat, MM) are listed as "coming soon" on certain platforms.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
114 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Shishir Patil Shishir Patil(Author of BFCL, Gorilla).

SkyThought by NovaSky-AI

0.2%
3k
Training recipes for Sky-T1 family of models
created 6 months ago
updated 3 weeks ago
Feedback? Help us improve.