Recent progress in Artificial Intelligence (AI) and Machine Learning has shown how large-scale labeled data can be leveraged to train effective and powerful supervised models.

When the underlying training data has data quality issues related to representativeness (i.e., unbalanced datasets), the risk is that the models trained on such data will mirror and amplify existing human bias. This will result in potentially critical fairness issues, such as the discrimination of certain segments of the population represented by the data.

For example, an image labelling task aimed to identify the job of people by looking at a picture, may lead to a female individual in medical attire to be labelled as 'nurse' rather than as 'doctor'. Such fairness issue is generated by underlying data quality issues (e.g., unbalanced, biased, and incomplete training datasets that lead to models which make unfair decisions).

The AI research community has looked at issues of fairness and at increasing the level of explainability of AI decisions in order to increase end-user trust in the system. The reason of such errors is the use of an unbalanced training dataset that under-represents certain parts of the population. Such blind spots in training datasets (i.e., under-represented parts of the feature space) lead to unknown unknowns problems (i.e., wrong classification decisions done with high algorithmic confidence, and, thus, difficult to stop automatically).

In this project space, we are working on designing "human-in-the-loop" solutions that incorporate both experts and non-experts to make sure datasets are not labelled in a biased way but rather represent well and fairly different parts of the feature space.

Human-in-the-loop systems can feature high transparency, with the accountability of a group of contributors or individual ones. Explainability and interpretability can as well be obtained either by explicit means (i.e., asking contributors for justifications of their annotations) or with an implicit approach (i.e., by monitoring contributors’ behavior).

The research conducted by the lab led by A/Prof Gianluca Demartini aims at building solutions to AI fairness issues by efficiently and effectively incorporating humans-in-the-loop. We investigate how to best combine AI, experts, and non-experts to label data at scale by making sure the difficult cases are handled by humans.