The School of ITEE is hosting the following PhD thesis review seminar
Insight Recommendation for Visual Data Exploration
Speaker: Rischan Mafrur
Host: Assoc Prof Guido Zuccon
Abstract: Visual data exploration is pervasive across various industries and organizations, facilitating the discovery of data-driven actionable insights. However, unveiling these insights necessitates analysts to manually construct a substantial number of aggregate queries and visually examine their results in search of valuable and insightful visualizations. Nevertheless, the manual generation and scrutiny of all potential visualizations to uncover insights remain an impractical task for data analysts. As a result, numerous research endeavors have been devoted to the development of visual insight recommendation systems, which automatically recommend data visualizations that reveal important data-driven insights. In these systems, a vast array of possible data visualizations is generated and ranked according to some metric of importance (e.g., a deviation-based metric), then the top-k most important visualizations are recommended to users.
While current visual insight recommendation systems hold promise due to their capacity to autonomously recommend the most important visualizations from multidimensional data, they often recommend similar visualizations, thereby limiting the number of insights obtained. To address this limitation, (i) this study posits that incorporating diversification techniques into the top-k insights recommendation process effectively eliminates redundancy and offers a concise, comprehensive overview of potential insights. However, integrating diversification into insight recommendations leads to a "process-first-diversify-next" approach, wherein all potential data visualizations are generated via executing a large number of aggregate queries. To mitigate this challenge and reduce the associated query processing cost, we propose a strategy that exploits both importance and diversity properties to prune a significant number of query executions. Furthermore, (ii) to enhance the efficiency of our proposed strategy, this thesis introduces various optimization techniques for incorporating diversification into visual insight recommendations, including: 1) Adaptive pruning with rectifying; 2) Sharing-based optimization; and 3) Hybrid optimization, which combines adaptive pruning with rectifying and sharing-based optimization.
Existing visual insight recommendation systems typically assume that the data being analyzed is clean, disregarding data quality issues that may impede the recommendation process. This work examines one prevalent data quality problem: incomplete data. Incomplete data can skew analyses and diminish the advantages of data-driven approaches, resulting in subpar and misleading recommendations. Although numerous data imputation methods have been proposed to address incomplete data, their efficacy in fully resolving the issue is questionable. Thus, it is imperative to explore the impact of incomplete data on various visual analytics and determine how visual analytics are influenced by incomplete data. In this thesis, (iii) we undertake a study to investigate the relationship between incomplete data and recommended visual analytics under a combination of different conditions, encompassing the distribution of incomplete data, the data imputation methods employed, the types of insights revealed by visualizations, and the quality measures used to evaluate recommendations. Additionally, considering the ubiquity of incomplete data, it is crucial to develop efficient methods for addressing this challenge. In this thesis, (iv) we propose an efficient, insight-aware approach for recovering incomplete data in visualization recommendations. Our approach concentrates on identifying which data to impute rather than the imputation method itself. The objective of our insight-aware recovery of incomplete data is to prioritize the imputation of missing cells that maximize the effectiveness of recommendations.
Bio: Rischan Mafrur is a Ph.D. candidate from the School of ITEE at the University of Queensland under the supervision of A/Prof Mohamed Sharaf and A/Prof Guido Zuccon. He received his M.Eng. in Computer Engineering from Chonnam National University South Korea and a B.Eng. in Computer Engineering from Sunan Kalijaga State Islamic University Indonesia. His research interest includes data exploration and data visualization.
About Data Science Seminar
This seminar series is hosted by EECS Data Science.