Our approach was to leverage a relatively new technique called “storyline visualization” to convey patterns and trends in the dataset. In the past, storyline visualization techniques have been used to communicate interactions between individuals over time, which, for example, can come from scene colocation in a movie or novel. In our solution, we have used storylines in a manner that would typically be an application of “parallel sets.” However, we chose not to use parallel sets; while parallel sets are good for conveying categorical information, it is difficult to trace entities through the visualization, which is useful to understand data that is changing over time.
We use storylines to convey what we call “common paths” in the visualization. These are combinations of particular attributes that many people tend to exhibit. For example, a fairly common path was a male who scored in the highest bracket on the ACT, declared and majored in engineering their first and second years. Our philosophy was to use storyline visualizations to understand the most common paths, which means that uncommon, or more unique individuals are not represented visually. In other words, this technique focuses on the major trends at the expense of ignoring outliers.
In these visualizations, the horizontal axis encodes different variables (i.e., columns from the dataset such as gender and ACT score). The vertical axis does not directly encode any information. Instead, it is used to group storylines together when they share a common attribute for a particular column. This allows us to more easily see the relationships between the common paths we are interested in, and the flow between groups over time. Furthermore, the vertical arrangement of storylines is carefully adjusted to make the visualizations easier to read. Our custom multi-stage optimization algorithm is used to reduce effects that detract from the visualization’s readability. The algorithm reduces line crossings, line wiggles, and unnecessary whitespace, which is a non-trivial combinatorial optimization problem.
Dustin Arendt Pacific Northwest National Laboratory 902 Battelle Blvd. Richland, WA 99354 email: email@example.com