Measurement, Scales, Averages, Meaningfulness

The Data Science Discipline of the School of EECS is hosting the following guest seminar:

Measurement, Scales, Averages, Meaningfulness

Speaker:Prof Nicola Ferro, University of Padua, Italy
Host: Prof Guido Zuccon

Abstract: The main goal of Information Retrieval experimentation is to determine the effectiveness of IR systems and to compare them in order to determine the best approaches. Evaluation measures are the way to quantify the effectiveness of IR systems and their scores are then used in follow-up statistical analyses, aimed at drawing inferences about the analysed systems and how they would perform once in production.

However, evaluation measures are based on measurement scales which, in turn, determine the allowable operations with scores from those scales. For example, strictly speaking, mean and variance should be computed only when relying on interval scales as well as parametric significance tests. Departing from scale properties may cause a bias in the evaluation outcomes and this would be valuable of investigation in itself.

Moreover, measurement scales are closely related to the notion of meaningfulness of the conclusions drawn, i.e. their invariance with respect to allowable transformations of a measurement scale. The notion of meaningfulness is little known and should be better explored in IR.

In this talk, we will introduce the fundamental notions about scales of measurement and meaningfulness, and we will show how they apply to IR evaluation measures. Unfortunately, most IR evaluation measures are not interval scales and those depending on the recall base will never be. However, we will propose an approach to transform measures not depending on the recall base into proper interval scale. Finally, we will discuss the outcomes of a thorough experimentation on TREC collection, deeply analysing the impact of departing from scale assumption and showing that, on average, 25% of the decisions about which systems are significantly different will change because of the scale properties of IR evaluation measures.

Main References

Ferrante, M., Ferro, N., and Pontarollo, S. (2019). A General Theory of IR Evaluation Measures. IEEE Transactions on Knowledge and Data Engineering (TKDE), 31(3):409–422.

Ferrante, M., Ferro, N., and Losiouk, E. (2020). How do interval scales help us with better understanding IR evaluation measures?. Information Retrieval Journal, 23(3):289-317.

Ferrante, M., Ferro, N., and Fuhr, N. (2021). Towards Meaningful Statements in IR Evaluation. Mapping Evaluation Measures to Interval Scales. IEEE Access, 9: 136182-136216.

Speaker Biography: Nicola Ferro is a full professor in computer science at the Department of Information Engineering of the University of Padua, Italy. He is head of the Intelligent Interactive Information Access (IIIA) hub and of the Information Management Systems (IMS) research group. His research interests include information retrieval, its experimental evaluation, multilingual information access and digital libraries. He is the coordinator of the CLEF evaluation initiative, which involves more than 200 research groups world-wide in large-scale IR evaluation activities. He has published more than 400 papers on information retrieval, digital libraries, and their evaluation. He was inducted to Class 2023 of the SIGIR Academy.

About Data Science Seminar

This seminar series is hosted by EECS Data Science.

Venue

Prentice Building, Learning Theatre 42-115

Thinking Globally, Testing Locally: Lessons in AI Safety Evaluation from the Polish Perspective

2 Jul 2025

Measurement, Scales, Averages, Meaningfulness

About Data Science Seminar

Venue

Other upcoming sessions

Thinking Globally, Testing Locally: Lessons in AI Safety Evaluation from the Polish Perspective