The School of EECS is hosting the following PhD Progress Review 1 Confirmation Seminar:

See All You Want: Detecting Everything with Vision Foundation Models

Speaker: Jia Syuen Lim (Jason)
Host: Dr. Xin Yu

Abstract: Deep visual learning systems have significantly advanced real-world applications, but their reliance on extensively annotated datasets remains a critical bottleneck due to the labour-intensive nature of data labelling. To address this challenge, we leverage Vision-Language Models (VLMs) to enhance generalized object detection and tracking while minimizing the need for manual annotations. We introduce Dispersing Prompt Expansion DiPEx, a self-supervised prompt learning strategy that overcomes the limitations of manually crafted text queries in VLMs, which often miss objects due to semantic overlap diminishing detection confidence. By progressively expanding a set of distinct, non-overlapping hyper-spherical prompts, our method enhances recall rates and significantly improves performance in downstream tasks like out-of-distribution object detection, surpassing other prompting methods by up to 20.1% in average recall. Extending this approach to the agricultural domain, we develop Track Any Peppers TAP, a weakly supervised ensemble technique for tracking sweet peppers. Leveraging the zero-shot detection capabilities of VLMs like Grounding DINO, we automatically generate pseudo-labels for sweet peppers in video sequences, reducing the need for manual annotation. These pseudo-labels train a YOLOv8 segmentation network, enhanced with pre-processing techniques to improve detection under challenging conditions, and integrate with the MASA adapter and state-of-the-art association algorithm for effective tracking, achieving a HOTA score of 80.4%. Together, these studies underscore the transformative potential of VLMs in reducing manual annotation requirements by generating high-quality pseudo-labels, advancing generalized object detection and tracking systems adaptable to diverse contexts—from generic class-agnostic scenarios to specialized applications in agriculture—and paving the way for more efficient and scalable solutions in downstream computer vision tasks.

Bio: Jason is a second-year PhD student in the Data Science group at the School of EECS, supervised by Dr. Yadan Luo and Prof. Zi Huang. His research centres on out-of-distribution detection for visual detection tasks. He received his bachelor's degree in Electrical and Biomedical Engineering from The University of Queensland. His work has been featured in conferences such as NeurIPS, IJCAI, and ECCV.

 

About Data Science Seminar

This seminar series is hosted by EECS Data Science.

Venue

In Person: 78-421, General Purpose South, Room 421 or Zoom: https://uqz.zoom.us/j/82692414795