BERT-AttributeExtraction  by sakuranew

Attribute extraction using BERT for knowledge graphs

created 6 years ago
264 stars

Top 97.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a solution for attribute extraction in knowledge graphs, specifically targeting character entries from Baidu Encyclopedia. It offers two primary methods, fine-tuning and feature extraction, leveraging BERT models for improved accuracy. The target audience includes researchers and developers working on knowledge graph construction and information extraction.

How It Works

The project utilizes BERT (Bidirectional Encoder Representations from Transformers) for attribute extraction. It offers two distinct approaches: fine-tuning, where a pre-trained BERT model is further trained on a specific dataset for the extraction task, and feature extraction, where BERT generates vector representations of text, which are then used with traditional machine learning classifiers (like MLP) for attribute prediction. This dual approach allows for flexibility depending on computational resources and desired performance.

Quick Start & Requirements

  • Install: No explicit installation instructions are provided beyond dependency management.
  • Prerequisites: Tensorflow >= 1.10, scikit-learn, pre-trained BERT-Base Chinese models (vocab.txt, bert_config.json, bert_model.ckpt).
  • Dataset: Requires data formatted as "Entity#Attribute#Label#Text", with labels derived from Baidu Encyclopedia infoboxes.
  • Links: No official quick-start or demo links are provided.

Highlighted Details

  • Implements both BERT fine-tuning and feature extraction methods for attribute extraction.
  • Supports attribute extraction from Chinese character entries in Baidu Encyclopedia.
  • Provides example commands for data processing, model training, and feature extraction.
  • Reports promising results for the fine-tuning method on a birthplace dataset (e.g., F1-score of 0.965).

Maintenance & Community

  • The primary author is zhao meng.
  • No information on community channels, roadmap, or ongoing maintenance is available in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The dataset labeling is noted as potentially imperfect due to manual annotation. The project relies on specific pre-trained BERT model checkpoints and requires manual download. There are no explicit test execution instructions or automated testing frameworks mentioned.

Health Check
Last commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.