Political Bias in Large Language Models for Content Moderation: a Persona-based Perspective

The School of EECS is hosting the following HDR Progress Review 1 Confirmation Seminar:

Political Bias in Large Language Models for Content Moderation: a Persona-based Perspective

Speaker: Stefano Civelli
Seminar type: PhD Confirmation Seminar
Host/Chair: Dr. Rocky Chen

Abstract:

This PhD research examines political bias in LLM-based content moderation and evaluates persona-based prompting as a lightweight alternative to retraining for introducing political diversity. By conditioning models on personas mapped to the Political Compass Test, the work studies whether prompt-level ideological cues influence moderation decisions. Large-scale experiments across models, datasets, and modalities show that overall accuracy is largely unaffected by persona ideology, but finer-grained analyses reveal systematic ideological patterns in disagreements that grow with model scale. The thesis further investigates whether persona ideology is internally integrated or remains superficial, using problem difficulty as a contrastive property with known stable internal representations, offering a path toward more principled auditing of political bias in content moderation.

Bio:

Stefano Civelli is a first-year PhD student in the School of Electrical Engineering and Computer Science at The University of Queensland. His research lies at the intersection of large language models, content moderation, and computational social science, with a focus on understanding and auditing political and ideological biases in AI systems. His work explores how prompt-level interventions, such as persona-based conditioning, influence downstream model behavior and fairness outcomes. Stefano is supervised by Prof. Gianluca Demartini and holds a Master’s degree in Computer Science and Engineering from the Polytechnic University of Milan.

About Data Science Seminar

This seminar series is hosted by EECS Data Science.

Venue

Room: 78 - 632 (MM Lab)