Document-level Relation Extraction from Large-scale Noisy Data
The School of EECS is hosting the following HDR Progress Review 3 Confirmation Seminar:
Document-level Relation Extraction from Large-scale Noisy Data
Speaker: Phan Khai Tran
Abstract
Relation extraction (RE) is a fundamental task in natural language processing that aims to extract structured relational facts from unstructured text, supporting downstream applications like knowledge graph construction. While traditional RE systems focus on single-sentence contexts, this setting is often insufficient for capturing complex relations expressed across multiple sentences or an entire document. Document-level Relation Extraction (DocRE) offers a more practical alternative but introduces new challenges, including (1) noisy contextual input, (2) long-tailed relation distributions, and (3) limited high-quality annotated data – challenges that current representation-focused DocRE methods struggle to address.
This thesis tackles these issues to advance DocRE under more realistic conditions. The key contributions include: (1) a method that models interdependencies among intra-document entity pairs to more effectively identify evidence sentences and reduce input noise, (2) a hierarchical embedding-level data augmentation framework to mitigate long-tailed distribution bias, and (3) a semi-supervised learning approach with pseudo-labelling and cross-supervision training to better leverage abundant unlabelled and weakly labelled data.
This research broadens the capabilities of DocRE and demonstrates significant improvements in robustness under noisy conditions, enhancing its practicality for real-world applications.
Bio
Phan Khai Tran is a Ph.D. student from the School of Electrical Engineering and Computer Science at The University of Queensland (UQ) under the supervision of Prof. Xue Li and A/Prof. Wen Hua. He received a Master of Information Technology from UQ in 2021. His current research interests include natural language processing, deep learning and relation extraction.
About Data Science Seminar
This seminar series is hosted by EECS Data Science.
Venue
Zoom: https://uqz.zoom.us/j/5980928669