Tackling Data Quality Challenges for Causal Effect Estimation
The School of EECS is hosting the following PhD Thesis Review Seminar:
Tackling Data Quality Challenges for Causal Effect Estimation
Speaker: Hechuan Wen
Host: Dr. Rocky Chen
Abstract: The potential outcome framework provides a theoretical justification for causal effect estimation to discover the potential gain or loss to an entity under a specific intervention, e.g., policy, medicine, or service. Arguably, causal effect estimation plays a crucial role to support the decision-making in various high-stakes domains, as real-world action can be taken if that intervention is expected to render positive influence on our target, e.g., sales or health status. Back in the day, extensive literature focuses on leveraging the statistical tool to calculate averaged treatment effect, however, causal effect estimation at the individual level is getting increasingly popularity recently. Also, due to the prosperity in deep learning, the downstream estimators are gaining substantial capability to simulate the non-linear mapping.
As obtaining dataset at large scale from randomized control trails is costly, time-consuming, and sometime unethical, leveraging the observational dataset for causal effect estimation in a data-driven fashion becomes a more affordable way. Our research on causal effect estimation spans three dimensions: 1). upstream strategies for addressing data scarcity, 2). downstream treatment effect estimator design, and 3). model scalability to more realistic scenarios. In this thesis, we revolve around such three themes and propose more generalizable methods to address the three challenges as follows:
- Improving sub-optimal solution for label acquisition under limited budget: We first propose the method --- MACAL, which is a simplified yet effective model-independent method that jointly reduces the model variance and distributional discrepancy during data acquisition. Further, we complement the theoretical analysis with an improved framework for which its risk upper bound can be well quantified by measurable terms. Then, we propose FCCM, a generalizable algorithm that can work for more general data distribution and flexible acquisition scheme.
- Facilitating generalizability of the trained estimator to the target domain: The existing neural methods assume the same distribution and availability of variables at both training and inference (i.e., runtime) stages by default. However, domain corruption could happen. To counter runtime domain corruption, we build an adversarially unified variational causal effect model, named VEGAN, with a novel two-stage adversarial domain adaptation scheme to reduce the latent distribution disparity between treated and control groups first, and between training and runtime variables afterwards.
- Unleashing uncertainty modeling and scalability to the graph data: Leveraging the rich relational information from networked data for causal effect estimation has been proven beneficial to deconfound at the population scale. However, the potential risk of individual-level treatment effect estimation on the network data has been largely underexplored. To create a more trustworthy causal effect estimator, we propose the uncertainty-aware graph deep kernel learning framework to model the prediction uncertainty with a Gaussian process and identify unreliable estimations. Furthermore, by establishing a sparse variational optimization scheme, our proposed framework is scalable to large graphs.
Bio: Hechuan Wen is a Ph.D. candidate at the School of EECS, The University of Queensland, under the supervision of Dr. Rocky Chen, Prof. Hongzhi Yin, and Prof. Shazia Sadiq. He completed his B.Eng. in Aircraft Engineering at Nanjing University of Aeronautics and Astronautics in 2019, and his M.Com. in Business Analytics at The University of Sydney in 2021. Currently, His research interests include causal inference (under potential outcome framework), active learning, and synthetic data augmentation.
About Data Science Seminar
This seminar series is hosted by EECS Data Science.
Venue
Zoom Link: https://uqz.zoom.us/j/83331840440