Mitigating Imbalanced Data Distribution in Heterogeneous Information Networks
The School of ITEE is hosting the following PhD progress review 1 seminar
Mitigating Imbalanced Data Distribution in Heterogeneous Information Networks
Speaker: Xinyi Gao
Host: A/Prof Hongzhi Yin
Abstract: Modern real-world systems comprise multi-typed components with diverse interactions, and heterogeneous information networks (HINs) have been proposed to leverage diverse node and edge types to model the components and interactions within a system and handle its complexity and heterogeneity. With their powerful representation capabilities, HINs enable the abstraction of various system problems into machine learning tasks and achieve competitive predictive performance by assuming the availability of abundant and balanced task-specific labelled data. Unfortunately, sustaining this assumption is challenging due to the time-consuming and resource-intensive nature of the data annotation process. In real-world scenarios, data and labels in HINs exhibit an imbalanced distribution, where certain types of data have significantly fewer samples than others. This imbalanced data distribution limits the practicality of machine learning models in real-world applications, as they tend to be biased towards the majority and perform poorly on the minority. Apart from the node imbalance in quantity, we recognize a crucial and distinctive challenge in HINs: semantic imbalance and present the data augmentation method for the semantic imbalance problem in imbalanced HINs named Semantic-aware Node Synthesis. The comprehensive experimental study demonstrates that SNS consistently outperforms existing methods in different benchmark datasets.
Bio: Xinyi Gao completed his B.Eng. and M. Eng. at Xi’an Jiaotong University. Currently, he is a Ph.D. student at the School of Information Technology and Electrical Engineering, the University of Queensland under the supervision of Dr. Hongzhi Yin. His research interests include Graph representation learning and imbalanced data learning.
About Data Science Seminar
This seminar series is hosted by EECS Data Science.