Discover and explore top open-source AI tools and projects—updated daily.
Text segmentation toolkit for robust sentence splitting
Top 34.0% on SourcePulse
This library provides robust, efficient, and adaptable text segmentation into sentences or other semantic units, targeting NLP researchers and developers. It offers state-of-the-art performance across 85 languages with its SaT models, significantly outperforming previous methods and existing tools.
How It Works
The core is the SaT (Segment Any Text) model, a universal approach to sentence segmentation. It leverages a transformer architecture, offering improved performance and reduced computational cost compared to the previous WtP (Where's the Point?) model. SaT models can be further adapted to specific domains or languages using LoRA, enabling highly customized segmentation.
Quick Start & Requirements
pip install wtpsplit
or pip install wtpsplit[onnx-gpu]
/ pip install wtpsplit[onnx-cpu]
for ONNX support.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The library includes previous WtP models for reproducibility, but SaT is the recommended and actively developed model. Some advanced features like LoRA export for ONNX are experimental.
2 months ago
1 week