A Visualization and Data Derivation Technique to Support the Diagnosis of Missing Data Mechanisms
Visual-Interactive Idiom, Missing Data Mechanisms, Visual Analysis
Missing values are a pervasive problem in most data collection processes. Several methods can deal with missing values, and choosing one depends on the diagnosis of the missing data mechanism—the way that missingness correlates with variables. One way of diagnosing the mechanism is by comparing pairs of variables using data visualizations. However, the visualizations commonly used for this task use visual encodings that were not
specifically designed for it, thus making users actively pursue cues to support reasoning instead of explicitly them those cues. Thus, this dissertation project proposes a visual-interactive idiom for diagnosing missing data mechanisms. The approach consists of design choices for visual encodings and interactions that support the process of diagnosis, and also a data derivation algorithm that quantifies two metrics to assist reasoning: MCAR similarity—how much the missing data distribution differs from a perfect uniform sample—and MCAR plausibility—how much the missing data distribution resembles a random shape. This project presents the rationale behind the components of the idiom, showing how it supports the whole diagnosis task. The current results use synthetic data to validate the ability of the idiom in depicting the missing data mechanisms, and real data to demonstrate how it can assist in practical analysis scenarios.