The School of EECS is hosting the following PhD Progress Review 3 Seminar
Methods for the Effective Retrieval of Chat Conversations
Speaker: Ismail Sabei
Host: Dr Joel Mackenzie
Abstract:
Modern messaging systems, such as Slack, WhatsApp, and WeChat, facilitate synchronous and asynchronous textual communication among multiple users. These chat applications support multi-participant conversations, where users interact simultaneously within Groups (WhatsApp, Messenger or Channels (Slack). This conversation structure introduces a high likelihood of information loss, as independent sub-conversations intertwine with one another, making the retrieval of relevant past messages a significant challenge.
To address this challenge, we introduce a new research task, Search in Chat Conversations (SCC), aimed at developing effective retrieval methods for chat conversation archives. SCC presents several challenges, ranging from understanding user search behaviour to building a robust test collection and improving ranking models for ad-hoc chat retrieval. To systematically explore these challenges, we propose the following research questions:
RD1: Understanding User Search Behaviour in Chat Conversations. What are the primary search intents behind users’ searches within chat conversations, and what types of information and chat objects (messages, threads, documents) are they seeking? What strategies do users employ to navigate and search through chat conversations, and how do they cope with unsuccessful search attempts? How satisfied are users with the current search functionality of chat applications, including specific features that affect satisfaction, common challenges encountered, and possible areas for improvement?
RD2: Developing a Standard IR Test Collection for SCC. To support future research in chat search, we constructed SCC—a test collection for evaluating search in chat conversations—based on insights from RQ1. SCC provides a structured benchmark with 114 known-item retrieval topics or searching over 437,893 Slack chat messages. This collection enables empirical evaluation of both traditional retrieval models (e.g., BM25) and neural retrieval methods.
RD3: Enhancing Retrieval via Neural Methods and Representation Learning. Building upon SCC, we investigate advanced neural ranking methods to enhance search effectiveness in chat conversations. Our key research directions include: (1) Contrastive Learning with First-Utterance-Conversation Pairs: Investigating how learning representations from initial conversation utterances can improve retrieval. (2) LLM-Generated Queries for Training: Exploring the impact of replacing first utterances with synthetically generated queries to improve model generalisation.
By establishing SCC, analysing user search behaviour (RD1), and building a test collection grounded in real use cases (RD2), and evaluating neural ranking strategies (RD3), this research aims to bridge the gap between traditional IR methods and deep-learning-based retrieval techniques for chat conversations. The findings offer practical guidelines for designing more effective and user-centric search functionalities in modern messaging systems.
Bio: Ismail Sabei is a PhD candidate at the University of Queensland, supervised by Prof. Guido Zuccon and A/Prof. Bevan Koopman. He received his Master’s degree in Information Technology from the Queensland University of Technology (QUT) and currently works as a lecturer at Jazan University in Saudi Arabia. His research focuses on Information Retrieval in chat applications, including user search behaviour, test collection development, and dense retriever training. He has released open resources for the research community, including the SCC test collection for chat conversation search.
About Data Science Seminar
This seminar series is hosted by EECS Data Science.
Venue
Room 78-632