Non-factoid question answering (NFQA) is a challenging and under-researched task that requires constructing long-form answers, such as explanations or opinions, to open-ended non-factoid questions - NFQs. There is still little understanding of the categories of NFQs that people tend to ask, what form of answers they expect to see in return, and what the key research challenges of each category are. This work presents the first comprehensive taxonomy of NFQ categories and the expected structure of answers. The taxonomy was constructed with a transparent methodology and extensively evaluated via crowdsourcing. The most challenging categories were identified through an editorial user study. We also release a dataset of categorised NFQs and a question category classifier. Finally, we conduct a quantitative analysis of the distribution of question categories using major NFQA datasets, showing that the NFQ categories that are the most challenging for current NFQA systems are poorly represented in these datasets. This imbalance may lead to insufficient system performance for challenging categories. The new taxonomy and the category classifier will aid research in the area, helping to create more balanced benchmarks and to focus models on addressing specific categories.

Speaker Bio:

Valeriia Baranova-Bolotova, a third-year PhD candidate in Information Retrieval at RMIT University (supervised by Mark Sanderson, Falk Scholer, and Bruce Croft). Former head of the NLP research and development department at Tinkoff. Her main research interests include natural language processing, information retrieval, and machine learning. The main focus of her PhD is non-factoid multi-document question answering. She has been the first author and a co-author of several papers published in ACL, CIKM, CHIIR, and SIGIR, and the recipient of the best paper awards for her full papers in CIKM 2020 and SIGIR 2022.

About Data Science Seminar

This seminar series is hosted by EECS Data Science.

Venue

https://uqz.zoom.us/j/89362232168
Room: 
46-442 and via Zoom