Towards Robust Multimodal Video Retrieval: From Data Representation to Cross-modal Optimization
The School of EECS is hosting the following PhD Progress Review 1:
Towards Robust Multimodal Video Retrieval: From Data Representation to Cross-modal Optimization
Speaker: Bingqing Zhang
Host: Dr Yadan Luo
Abstract: Multimodal video retrieval connects visual content with natural language, enabling applications in search, recommendation, and surveillance. However, current systems struggle with generalization and robustness. This work explores four key challenges: cross-modal semantic alignment, handling uncertainty in ambiguous queries and noisy videos, adapting to domain shifts in text and video data, and integrating additional modalities like audio and music. We address the first two by proposing TokenBinder, a retrieval framework using a one-to-many coarse-to-fine alignment strategy, and UMIVR, an uncertainty-aware system that refines user queries through training-free metrics. Ongoing research focuses on domain adaptation and richer multimodal fusion to enhance real-world applicability.
Biography: Bingqing Zhang is a PhD student at the school of EECS in the University of Queensland, under the supervision of A/Prof Sen Wang, Prof Xue Li and A/Prof Jiajun Liu. He obtained his bachelor's and master's degrees in school of information from Renmin University of China. His research interest focuses on multimodal video retrieval.
About Data Science Seminar
This seminar series is hosted by EECS Data Science.
Venue
Zoom Link: https://uqz.zoom.us/j/83237610899