Towards a Unified Framework for Multi-Modal and Multi-Level Video Temporal Grounding

The School of EECS is hosting the following PhD progress review 1 Confirmation Seminar:

Towards a Unified Framework for Multi-Modal and Multi-Level Video Temporal Grounding

Speaker: Zhuo Cao
Host: Dr Yadan Luo

Abstract: Video Temporal Grounding (VTG) is essential for understanding untrimmed videos in tasks like retrieval, surveillance, and summarization. However, existing methods fall short in short-moment retrieval, handling multiple disjoint moments, and supporting multimodal queries. This work explores four key challenges: enhancing short-moment retrieval, enabling multi-moment retrieval (MMR), scaling to video corpus retrieval (VCMMR), and supporting diverse query modalities beyond text. We propose FlashVTG for accurate moment and highlight detection, and extend it to FlashMMR, the first framework and benchmark for MMR. Ongoing work focuses on scaling to VCMMR and enabling image- and video-based querying for more generalizable VTG.

Biography: Zhuo Cao is a Ph.D. student at the school of EECS in the University of Queensland, under the supervision of Prof. Xue Li and A/Prof. Sen Wang. He obtained his Bachelor of Science in Statistics from Shandong University, China. He completed his Master of Data Science at University of Queensland. His research interests include video understanding and multi-modal machine learning.

About Data Science Seminar

This seminar series is hosted by EECS Data Science.

Venue

Room: 78-411 General Purpose South Room 411
Zoom Link: https://uqz.zoom.us/j/84913160461