BERT-AttributeExtraction by sakuranew

Attribute extraction using BERT for knowledge graphs

Created 7 years ago

265 stars

Top 96.5% on SourcePulse

Project Summary

This repository provides a solution for attribute extraction in knowledge graphs, specifically targeting character entries from Baidu Encyclopedia. It offers two primary methods, fine-tuning and feature extraction, leveraging BERT models for improved accuracy. The target audience includes researchers and developers working on knowledge graph construction and information extraction.

How It Works

The project utilizes BERT (Bidirectional Encoder Representations from Transformers) for attribute extraction. It offers two distinct approaches: fine-tuning, where a pre-trained BERT model is further trained on a specific dataset for the extraction task, and feature extraction, where BERT generates vector representations of text, which are then used with traditional machine learning classifiers (like MLP) for attribute prediction. This dual approach allows for flexibility depending on computational resources and desired performance.

Quick Start & Requirements

Install: No explicit installation instructions are provided beyond dependency management.
Prerequisites: Tensorflow >= 1.10, scikit-learn, pre-trained BERT-Base Chinese models (vocab.txt, bert_config.json, bert_model.ckpt).
Dataset: Requires data formatted as "Entity#Attribute#Label#Text", with labels derived from Baidu Encyclopedia infoboxes.
Links: No official quick-start or demo links are provided.

Highlighted Details

Implements both BERT fine-tuning and feature extraction methods for attribute extraction.
Supports attribute extraction from Chinese character entries in Baidu Encyclopedia.
Provides example commands for data processing, model training, and feature extraction.
Reports promising results for the fine-tuning method on a birthplace dataset (e.g., F1-score of 0.965).

Maintenance & Community

The primary author is zhao meng.
No information on community channels, roadmap, or ongoing maintenance is available in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The dataset labeling is noted as potentially imperfect due to manual annotation. The project relies on specific pre-trained BERT model checkpoints and requires manual download. There are no explicit test execution instructions or automated testing frameworks mentioned.

Health Check

Last Commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days