Towards Robust Multimodal Video Retrieval: From Data Representation to Cross-modal Optimization

The School of EECS is hosting the following PhD Progress Review 1:

Towards Robust Multimodal Video Retrieval: From Data Representation to Cross-modal Optimization

Speaker: Bingqing Zhang
Host: Dr Yadan Luo

Abstract: Multimodal video retrieval connects visual content with natural language, enabling applications in search, recommendation, and surveillance. However, current systems struggle with generalization and robustness. This work explores four key challenges: cross-modal semantic alignment, handling uncertainty in ambiguous queries and noisy videos, adapting to domain shifts in text and video data, and integrating additional modalities like audio and music. We address the first two by proposing TokenBinder, a retrieval framework using a one-to-many coarse-to-fine alignment strategy, and UMIVR, an uncertainty-aware system that refines user queries through training-free metrics. Ongoing research focuses on domain adaptation and richer multimodal fusion to enhance real-world applicability.

Biography: Bingqing Zhang is a PhD student at the school of EECS in the University of Queensland, under the supervision of A/Prof Sen Wang, Prof Xue Li and A/Prof Jiajun Liu. He obtained his bachelor's and master's degrees in school of information from Renmin University of China. His research interest focuses on multimodal video retrieval.

About Data Science Seminar

This seminar series is hosted by EECS Data Science.

Venue

Room: 78-411 General Purpose South Room 411
Zoom Link: https://uqz.zoom.us/j/83237610899

Value Capture from Enterprise AI

24 Mar 2026

Robust Collaborative Learning with Data Silos