On the Role of Human and Machine Metadata in Crowdsourced Data Annotation
The School of EECS is hosting the following PhD progress review 3 seminar
On the Role of Human and Machine Metadata in Crowdsourced Data Annotation
Speaker: Jiechen Xu
Host: Dr Joel Mackenzie
Abstract: The process of data annotation, also referred to as data labeling, is a crucial step in various research fields like machine learning and behavioral studies. This process involves systematically marking or coding gathered data to make it suitable for a specific purpose. In the context of machine learning, researchers often face the challenge of needing large amounts of labeled data to enhance performance. However, acquiring such datasets can be expensive and labor-intensive, as it typically requires significant manual work. Consequently, data annotation is often considered a monotonous, error-prone, and tedious task.
Despite practitioners usually taking on the role of data annotators, advanced data annotation techniques, such as automatic and semi-automatic methods, have been developed to address the demand for extensive data labeling. With the growth of crowdsourcing, researchers have found it feasible to source annotation tasks from the general public. Particularly, micro-task crowdsourcing platforms have been utilized to create large-scale annotated datasets, for instance, ImageNet. The main challenge now shifts towards ensuring the quality of these crowdsourced labels while keeping costs reasonable.
One possible approach to tackle this challenge is by utilizing metadata in crowdsourced data annotation tasks. Metadata, often described as 'data about data,' has shown its effectiveness in managing data quality. In the context of this study, the focus is on two types of metadata: human metadata and machine metadata. Human metadata is manually created by humans, while machine metadata is generated algorithmically. The research narrows down to two significant crowdsourced annotation tasks: relevance judgment and misinformation judgment. The primary goal of this research is to understand how metadata influences these two specific tasks. This is achieved by constructing a 2x2 matrix where one dimension represents the types of metadata (human and machine), and the other dimension encompasses the two annotation tasks (relevance judgment and misinformation judgment). To achieve this, three studies are conducted across this matrix. The first study explores the impact of both human and machine metadata on crowd workers participating in the relevance judgment task. The second study focuses solely on the effect of human metadata in the misinformation judgment task. Lastly, the third study investigates the influence of machine metadata, represented by output from a large language model, on the same misinformation judgment task as in the second study.
The first study delves into crowdsourced relevance judgment. Collecting relevance judgments from human assessors is vital to evaluate the effectiveness of Information Retrieval (IR) systems. Crowdsourcing has been successful in scaling up the collection of these judgments. This study examines how presenting additional metadata beyond just the topic and document being judged affects crowd assessors. Different variants of crowdsourced relevance judgment tasks are presented to assessors, including various types of metadata: human, machine, and content metadata. The impact of metadata on judgment quality, efficiency, and cost is studied, as well as how metadata quality influences collected judgments.
The second study addresses the problem of misinformation spreading online. While expert fact-checkers are effective, they are limited in scalability. Crowdsourcing presents an opportunity to complement their efforts. This study investigates how presenting human metadata, evidence from others, influences the crowd's judgment of statement truthfulness. Different task designs are employed to understand how presented evidence affects crowd workers' judgment accuracy and performance. The results reveal that while some crowd workers are misled by presented evidence, others can benefit from it and produce better judgments.
The third study involves combining machine metadata and misinformation judgment. Crowd workers collaborate with Large Language Models (LLMs) to assess misinformation, using LLM output as machine metadata. The study evaluates how exposing crowd workers to LLM-generated information affects their judgment of statement truthfulness. The results indicate that crowd workers tend to overestimate truthfulness when influenced by LLM-generated information. However, their confidence decreases when they make mistakes due to relying on the LLM. Various worker behaviors emerge when the LLM is presented, indicating diverse working strategies when leveraging LLMs.
Bio: Mr Jiechen Xu received hist M.Sc. Degree in ecology in Minzu University, Beijing, China in 2017. He is a current PhD student in computer science in the School of EECS, UQ. His research interests includes human computation and crowdsourcing.
About Data Science Seminar
This seminar series is hosted by EECS Data Science.