CVPR Workshop 2026 on 'From Perception to Persuasion: Challenges and Advances in Misinformation Detection in Society (PP-MisDet )'

Misinformation in visual and multimodal form is no longer just an image forensics problem or a generic “detect the fake” benchmark. It is a systemic risk to civic trust, public safety, democratic process, and crisis response. The proposed workshop explicitly targets how manipulated visuals and misleading captions are weaponised to steer attention, escalate outrage, justify policy action, or erode institutional legitimacy.

Advances in generative vision–language systems have made it easy to fabricate persuasive visual “evidence” and pair it with misleading narratives, accelerating large-scale misinformation campaigns. PP- MisDet focuses on perception and persuasion: how falsified or staged visual content is created, circulated, trusted, and operationalised in high-stakes settings such as elections, health, and conflict. The workshop brings together computer vision, multimedia forensics, human factors, policy, and platform integrity, addressing both algorithmic innovation and interdisciplinary perspectives. We emphasise robustness under coordinated abuse, human–AI collaboration in verification and trustworthy evaluation benchmarks for real-world harm, beyond binary “deepfake vs. real.”

Topics to be covered

Multimodal misinformation and persuasion analysis
Detection, attribution, and provenance of AI-generated or manipulated media
Media authenticity
Cross-platform misinformation
Human perception, cognitive bias, and trust calibration
Safety-critical verification systems
Adversarial pressure against verification pipelines
Explainability, accountability, and auditability of fact-checking AI
Human–AI collaboration for verification
Benchmark datasets, evaluation protocols, and stress tests
Responsible synthetic data generation
Societal, legal, and policy implications
Transparency, safety, fairness, accountability, and abuse prevention in multimodal vision–language systems
Vision for societal good and civic resilience
Future directions

Submission

All submissions will be handled electronically via the workshop's CMT paper submission website CMT Website. Papers are limited to eight pages, including figures and tables, in the CVPR style. Additional pages containing only cited references are allowed. A complete paper should be submitted using the CVPR templates, which are blind-submission review-formatted templates.

Please refer to CVPR 2026 Author Guidelines for detailed formatting instruction.

The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.

Link for Paper Submissions

Organisation

Organising Co-Chairs

Dr Priyanka Singh
Senior Lecturer of Cybersecurity
School of Electrical Engineering and Computer Science at The University of Queensland

Dr Singh has a PhD in image forensics and has more than 11 years of experience in multimedia forensics, homomorphic encryption, privacy-preserving frameworks, and cloud security. Her recent work spans around provenance and tracking of media manipulation, explainable models for digital forensics and cyber security domain, encrypted domain perceptual hashing for privacy-preserving workflows, and attending to the security and privacy issues.

Professor Xue Li
Professor
School of Electrical Engineering and Computer Science at The University of Queensland

Professor Li's major areas of research interests and expertise include data mining, social computing, health data analytics, and intelligent web information systems. His work on large-scale behavioural pattern discovery and risk monitoring directly supports media manipulation and real-time harm assessment.

Associate Professor Pradeep K. Atrey
Co-Director of the Albany Lab for Privacy and Security (ALPS)
University of Albany, SUNY

Associate Professor Atrey's group works on multimedia intelligence in encrypted spaces, media rich disinformation detection and mitigation, privacy, and secure large-scale analytics. He has served in senior organisational roles for ACM Multimedia and related venues and has an extensive publication record in multimedia forensics, provenance, and content authentication.

Technical Program Committee

Yanjun Zhang, University of Technology Sydney, Australia
Guanfeng Liu, Macquarie University, Australia
Jia Wu, Macquarie University, Australia
Lin Yue, University of Adelaide, Australia
Manoranjan Mohanty, Carnegie Mellon University, Qatar
Vivek K. Singh, Rutgers University, USA
Sharif Abuadbba, CSIRO’s Data61, Australia
Shahroz Tariq, CSIRO’s Data61, Australia
Rajat Subhra Chakraborty, IIT Kharagpur, India
Ruchira Naskar, IIEST Shibpur, India
Parag Rughani, National Forensic Sciences University, India
Ling Chen, National Yang Ming Chiao Tung University, Taiwan
Marimuthu Palaniswami, The University of Melbourne
Shantanu Pal, Deakin University
Mashhuda Glencross, The University of Queensland
Anne Kruger, The University of Queensland
Katie Williams, The University of Queensland
Saeed Akhlaghpour, The University of Queensland

Web and Publicity Chair

Gagandeep Singh, The University of Queensland, Australia

Keynotes

Mohan S. Kankanhalli
National University of Singapore

Talk title:

Rethinking Misinformation Defense Through the Lens of Human Behaviour

Brief summary:

This keynote explores misinformation defence beyond classifier-based detection, focusing on how human cognition and social motivation shape responses to misleading content. The talk discusses credibility indicators, user sharing behaviour, and explainable AI systems for anticipating human responses to misinformation.

Speaker note:

Prof. Mohan S. Kankanhalli is Provost’s Chair Professor of Computer Science at the National University of Singapore and Founding Director of the NUS AI Institute. His research spans multimodal computing, computer vision, and trustworthy AI.
Gianluca Demartini
The University of Queensland

Talk title:

How Bias in LLMs can Influence Human Decisions

Brief summary:

This keynote explores how bias in large language models can shape human decision-making and online information exposure. The talk discusses political bias in AI systems and how AI can also be leveraged to detect persuasive and manipulative content online.

Speaker note:

Prof. Gianluca Demartini is Professor of Data Science and ARC Future Fellow at The University of Queensland. His research focuses on Information Retrieval, Responsible AI, and human-centred data science.

Vimala Balakrishna
Universiti Malaya

Talk title:

The Psychology of Digital Deception in the Age of AI

Brief summary:

This keynote examines how misinformation, scams, social engineering, and AI-generated content exploit human behaviour and decision-making. The session also discusses deepfakes, multimodal deception, and emerging AI-driven approaches for misinformation detection.

Speaker note:

Prof. Ts. Dr. Vimala Balakrishnan is a Professor at Universiti Malaya whose research focuses on cyberpsychology, social media behaviour, digital deception, and human-centred cybersecurity.

Dataset Challenge

For Submissions, please fill out this form

DGM⁴+ Challenge on Global Scene Inconsistency Detection

PP-MisDet Workshop @ CVPR 2026

Overview

The DGM⁴+ Challenge on Global Scene Inconsistency Detection aims to advance research on detecting and explaining semantic inconsistencies between visual content and accompanying text.

Modern multimodal misinformation increasingly relies on contextual manipulation, where fabricated foregrounds, misleading backgrounds, and deceptive captions are combined to produce persuasive but false narratives. Unlike traditional deepfake benchmarks that focus on low-level artifacts, DGM⁴+ emphasizes scene-level and narrative-level inconsistency.

This challenge provides a standardized benchmark for evaluating models that jointly reason over images and text to identify misleading content.

Task Description

Given an image–caption pair, participants must jointly perform:

Authenticity Classification: Predict whether the pair is semantically consistent (authentic) or contains contextual manipulation (manipulated).

Manipulation-Type Classification: For manipulated samples, predict whether the inconsistency corresponds to: Text Swap or Text Attribute.

ModificationTextual Grounding of Inconsistency: Identify caption tokens or spans that introduce misleading, fabricated, or contextually inconsistent information.

All three outputs are mandatory and are jointly evaluated.

Dataset

The DGM⁴+ dataset contains approximately 5,000 news-style image–caption pairs, including:

Authentic scenes
Text swap manipulations
Text attribute manipulations
Narrative reframing

Dataset Splits

Split	Availability	Purpose
Training	Public	Model development
Validation	Public	Hyperparameter tuning
Test	Hidden	Leaderboard evaluation

Public splits are released under a CC BY-NC 4.0 license.

Dataset is available to view and download at: https://drive.google.com/file/d/1kXNqljyJ7EHmHnRn3ORPgLjawnJ63YHL/view?usp=sharing

For more information about the dataset: https://arxiv.org/abs/2509.26047

Required Outputs

Each submission must provide predictions for all test samples.

(a) Authenticity Prediction

A CSV file containing:

id,label,confidence,manipulation_type

Where:

Binary_label ∈ {authentic, manipulated}
confidence ∈ [0,1]
manipulation_type ∈ {origin, text_swap, text_attribute}

For authentic samples, manipulation_type must be origin.

(b) Text Grounding Output

A JSON file containing token indices corresponding to misleading or manipulated text:

{

"0001": [3,4,5,6]

}

Tokenization follows the official tokenizer released with the dataset.

For authentic samples, an empty list must be submitted.

Optional Cross-Modal Grounding and Explanation Track

In addition to required outputs, participants are encouraged to submit cross-modal explanations linking misleading textual elements to relevant visual evidence.

For each manipulated sample, teams may optionally provide:

Alignments between identified misleading tokens and corresponding image regions, and/or
Visualizations illustrating how textual claims are supported or contradicted by visual content.

These explanations are intended to demonstrate how models reason about semantic inconsistency across modalities.

Optional submissions are not included in the official leaderboard evaluation and do not affect ranking.

However, high-quality cross-modal grounding and explanation outputs will be highlighted during the poster session and invited presentations.

Selected teams may receive a Best Explainability Award.

Optional Submission Format

Optional explanations may be submitted as:

Visualization images (PNG/JPEG), and/or
JSON files linking token indices to bounding boxes.

Example format:

{

"0001": {

"tokens": [3,4,5],

"regions": [0.12,0.34,0.56,0.78]

}

Submission Format

Each submission must be packaged as:

submission.zip

├── classification.csv

└── text_grounding.json

1. Classification Evaluation

Authenticity classification is evaluated using:

Accuracy
F1-score
ROC–AUC

Confidence scores are used for AUC computation.

2. Text Grounding Evaluation

For manipulated samples, prediction of text swap vs. text attribute is evaluated using:

Accuracy
Macro F1-score

This evaluates the system’s ability to distinguish different forms of semantic inconsistency.

Evaluation Protocol

Evaluation is conducted independently for each component and combined into a final score.

Final rankings are computed using:

Score = 0.5 × F1_binary

+ 0.3 × F1_text

+ 0.2 × F1_type

This weighting prioritizes reliable detection of semantic inconsistency while rewarding accurate explanation and manipulation-type recognition.

Challenge Timeline

Date	Milestone
Jan 10, 2026	Challenge launch
Mar 05, 2026	Submission portal opens
May 10, 2026	Test set release
May 20, 2026	Final submissions due
May 30, 2026	Results announced
Jun 03, 2026	On-site presentations and awards

Participation

To participate:

Download the training and validation sets

Develop and evaluate models

Submit predictions on the hidden test set

Registration links and instructions will be provided on this page.

Awards

Outstanding contributions will be recognised through:

Best Overall Performance
Best Explainability and Grounding
Best Student Team

Selected teams will be invited to present short papers and demos at PP-MisDet.

Ethical Use Policy

All manipulated samples are synthetically generated and do not depict real individuals or events.

Participants agree to:

Use the dataset exclusively for research purposes
Not generate or disseminate misinformation
Not redistribute the dataset without permission
Violations may result in disqualification.

Transparency and Reproducibility

After the challenge concludes, evaluation scripts, annotation guidelines, and benchmark statistics will be publicly released to support long-term community use.

Updates

Additional information regarding baseline models, leaderboard access, and submission procedures will be posted on this page.

Please check regularly for updates.