The School of EECS is hosting the following PhD Progress Review 3 Thesis Review Seminar:

Advanced Strategies to Alleviate Challenges of Data Scarcity in Deep Learning for Medical Image Analysis

Speaker: Chaoyi Li
Chair: Prof Brian Lovell

Abstract: Deep Learning models based on convolutional neural networks have significantly progressed in various medical image analysis tasks over recent decades. Despite achieving state-of-the-art performance, these models still require large-scale, high-quality annotated datasets. However, perfect training datasets are rare, especially in the medical imaging field, where both data and annotations are expensive to acquire. Common challenges with imperfect datasets in medical image analysis tasks include sparsely labelled data, distribution shifts and generalization issues.  This Ph.D. study aims to tackle the challenges by designing novel training strategies for deep learning models trained on imperfect datasets.

In our research, we address the imperfect dataset challenges in three aspects. First, we explore the use of curriculum learning to enhance the performance of a model trained with supervised learning on a limited dataset. Curriculum learning is inspired by the way humans learn, organizing data sequences from easy to hard for more effective training. We propose a new perspective on curriculum learning design by leveraging the current stage of the network to estimate the difficulty of data based on its in-domain uncertainty.

Second, we investigate the potential benefits of employing external datasets to address the issue of scarce annotated data through two approaches: utilizing unlabeled multi-modality data and synthetic data. For the first approach, we propose an uncertainty-guided cross-modality semi-supervised learning framework to reduce distribution shifts between different modalities and improve model performance. For the second approach, we introduce a unified framework that integrates the stages of image generation and classification into a single stage for the synthetic augmentation of training data and enhancing model precision.

Third, we study the generalization issue due to data scarcity by domain generalization methods. The nature of medical image acquisition (using different scanners and image acquisition protocols) results in an inconsistent training and test dataset distribution and thus degrades model performance. In real-life healthcare applications, collecting data each time to fine-tune a trained system is impractical. Thus, we propose a shape-focused augmentation module to complement existing domain generalization methods to further boost the model’s generalization ability.

 

About Data Science Seminar

This seminar series is hosted by EECS Data Science.

Venue

Online via Zoom https://uqz.zoom.us/j/85304625948