Towards Robustness: Boosting Approaches for Learning from Positive and Unlabeled Data
The School of EECS is hosting the following PhD confirmation progress seminar:
Towards Robustness: Boosting Approaches for Learning from Positive and Unlabeled Data
Speaker: Yawen Zhao
Host: Dr Xin Yu
Abstract: Weakly supervised learning stands as a promising middle ground between the precision of supervised learning and the explorative capabilities of unsupervised learning, addressing the real-world challenge of incomplete, inexact, and inaccurate labels. Among its subfields, Positive-Unlabeled (PU) learning stands out. PU learning deals with situations where only a bit of positive data and a large amount of unlabeled data are at hand. Most existing works are focused on neural-network based methods, due to the success of neural networks in various domains, particularly in computer vision and natural language processing. On the other hand, boosting methods, known for their efficacy in classification tasks for tabular data, appear promising for learning from positive and unlabeled data. However, boosting methods for PU learning is still an under-explored field in PU learning. We identify three key challenges that need to be addressed as we work towards PU boosting methods. 1) The existing PU boosting method relies on the accuracy of the initial model, bringing the risk that the errors it made will be inherited and magnified by the later model. Thus, overcoming the dependence on the initial model is the first challenge we need to address. 2) The occurrence of negative empirical risk during the training stage, which easily leads to overfitting, is a typical problem existing in PU learning. However, the training data that yields the negative risk carries useful information that can enhance the model performance. Therefore, the second challenge we need to address is making full use of the information carried out by the training data in a PU boosting framework. 3) An assumption exists in most existing PU works, that the positive data labeled or not is irrelevant to the features, while in the real world, it is usually the opposite case. Dealing with such bias is necessary to make PU boosting methods more applicable to real-world scenarios. Therefore, the third challenge we need to address is reducing the instance-dependent bias for PU boosting methods. To summarize, by overcoming these three key challenges, we aim to work towards PU boosting methods, which have superior performance alongside computational efficiency and minimal dependence on hyperparameter tuning.
Biography: Miss Yawen Zhao holds a master’s degree in information technology from the University of Queensland, Australia (2022) and is currently pursuing a PhD in computer science at the School of EECS under the supervision of Dr Miao Xu, Dr Nan Ye and Dr Weitong Tony Chen. Her research interests encompass positive-unlabeled learning and boosting algorithms.
About Data Science Seminar
This seminar series is hosted by EECS Data Science.