Enhancing Vision-and-Language Navigation with Multi-modal Prompts
The School of EECS is hosting the following PhD progress review 1 seminar
Enhancing Vision-and-Language Navigation with Multi-modal Prompts
Speaker: Haodong Hong
Host: Dr Sen Wang
Abstract: Vision-and-Language Navigation (VLN), which requires an agent to follow natural language instructions to reach a target destination, has attracted growing attention. However, existing VLN tasks mainly leverage textual instructions as guidance, leading to potential ambiguities for the agent and limiting the transmission of visual information from humans. To address this, our research introduces a novel task, Vision-and-Language Navigation with Multi-modal Prompts (VLN-MP), in which instructions consist of both natural language and images as prompts. We present three settings for VLN-MP based on the number and relevance of prompt images for different scenarios. To show how agents are trained and evaluated for VLN-MP, we implement a new benchmark that offers: (1) a training-free pipeline to transform textual instructions into multi-modal forms using landmark images; (2) two datasets created by applying this pipeline to existing VLN datasets; (3) a landmark-based model to handle multi-modal instructions as a strong baseline; (4) a novel metric for assessing the advantage of visual prompts. Extensive experiments validate the positive impact of image prompts on agent perception and navigation performance. Moreover, our VLN-MP-trained model can be applied to traditional VLN under the pre-explore setting, achieving state-of-the-art results.
Speaker Biography: Mr Haodong Hong is a PhD student from the Data Science group at the School of Electrical Engineering and Computer Science, the University of Queensland (UQ), Australia. He received his Bachelor’s degree in Electronic Engineering from Tsinghua University. He is currently working towards his PhD degree under the supervision of Dr Sen Wang and Dr Jiajun Liu. His research interests include multi-modal learning and vision-and-language navigation.
About Data Science Seminar
This seminar series is hosted by EECS Data Science.