Choosing a Visualization Method for Memory Profiles

A Tale of JavaScript Performance, Part 2

This article continues my series chronicling my investigation into JavaScript performance by creating HeapViz, a visualization tool for Chrome memory profiles. Today I am going to talk about my choice of visualization methods. If you missed it, catch part 1 here.

(source: Wikimedia Commons)

The perfect drawing of an imperfect heap

The value delivered by my tool will be determined by its ability to quickly diagnose issues with a particular memory profile. Thinking about ways that I could leverage intuition engineering to enhance the visualization, I came up with three criteria for success:

It needed to be easy to form a baseline. This will allow quick visual diffing between different heap profiles or time samples.
It needed to communicate problem areas quickly and effectively
It needed to effectively display many nodes. Many, many, many nodes.

In order to effectively establish a baseline we needed something that would, at a glance, represent a lot of related data. My two tools for representing nodes would be size and color. By having nodes drawn according to size, I would be able to quickly highlight areas of an app that have exceptionally large footprints. Similarly, color-coordinating nodes would allow at-a-glance analysis of the state of a heap.

With this general idea, I tackled the more specific problem of communicating problem areas. Taking some cues from the output of the Chrome heap profile tool and my own experience I knew that node self size and retained size were of critical importance. I also knew that I wanted some way of representing retainers because of their critical role in figuring out a fix for memory issues.

First guess? Force-directed graphs

Looking for a format that allowed for separately sized entities, color coordinated, with an indication of relationships between them lead me to the force-directed graph.

(source: Martin Grandjean)

Force directed graphs are great! They check all the boxes for communicating importance— efficiently representing nodes of varying sizes, color coordinated, and they show you relationships between nodes. D3 even provides a force layout module that makes it simple to implement one of these suckers.

Unfortunately, they do not satisfy the pesky performance requirement. Force directed layouts are expensive to compute. Most browser implementations take minutes to lay out even low-thousands of nodes. Furthermore, as they get large they get extremely visually congested.

A force directed graph with 200,000 nodes (source: graphmap.net)

If my tool takes many minutes to lay out a heap, or if it is difficult to get relevant diagnostic information about a single node at a glance, it will not be more useful than just parsing the data by hand. In the end, I decided to pass on the force-directed graph.

Let’s pack some circles

One thing I did like about force-directed graphs was their circular representation of nodes. It was visually attractive and easy to reason about. If only it wasn’t so darn expensive to compute!

A lot of the complexity in rendering a force layout comes from drawing the relationships between nodes. If I could find a layout that was similar but did not explicitly draw edges, I might be able to render the volume of nodes that I needed to.

Enter the circle pack layout:

(source: Mike Bostock and Jeff Heer)

I saw some potential here — it has many of the advantages of a force directed graph — circular nodes, colored nodes, and an at-a-glance sense of relative size — without the computational overhead of laying out a bunch of lines between objects.

I also saw a couple of downsides as well:

For deeply nested hierarchies, it is extremely space inefficient.
Representing non-hierarchical relationships between nodes is difficult.

To address the first point, I decided that I needed to flatten my data as much as possible. Remember that memory is generally represented as a graph, and sometimes as a dominator tree — it is not stratified by default, though it can be grouped by type or other qualifiers if desired.

The second point I decided to mulligan on. I liked how a circle packing layout looked and decided that the only indicator I would display for retainers would be a text list of them and a number on the node. The value in knowing retainers tends to come after a problem has been identified so I decided to simplify the initial visualization to include only those elements that highlight problem areas.

Honorable mention: Treemap

You might be wondering — if performance on large data sets is such a concern, why not use a treemap?

(source: MDN)

If I am being honest, the reasons I steered away from a tree map originally went something like:

Treemaps don’t look as visually appealing as circle packing layouts.
It would be too easy! Building a treemap is, compared to other graph types, fairly cheap to compute.
Firefox already did it.

I might add a couple of extra points now that I have adopted circle packing as my visualization method of choice.

I don’t care about hierarchies beyond a node’s type. Tree maps excel at quickly visualizing weight in a hierarchy, but for a relatively flat tree it can make it more difficult to quickly pick out outliers.
Anecdotally, circle packed layouts are generally considered easier to consume visually than equivalent treemaps. I believe they tell a better story — something about the space between nodes makes it easier to identify patterns between groups.

So, that’s settled! I decided to use the circle packed layout and consider it a fine choice for visualizing a memory heap.

Up next — Part 3: Renderers in all shapes and sizes