>>Good morning, everyone, so I’m just going
to start right after the bat here. “Power Tools for Powerful Visualization.”
By: Jana Beck.>>Interactive visualization on the Web is
a big area, it’s a super exciting and collaborative community so if you’re interested in it, like,
get, you know, get involved because the community is really great. But that’s not to say working
in this area is rainbows and sparkles 100 percent of the time. When you’re building
it on top of a large real world data set, there can be a lot of challenges.
So what I’m going to talk to you today about is sort of all the things that I wish somebody
had told me before I embarked on the project of writing a large data visualization library.
And so this is basically the tool set that I’ve gathered for myself over the last couple
of years of work, and I hope it will be useful to you.
So we’re going to talk about the most widely used data visualization set, and it’s data
driven documents, and the SVG driven format. So you use D3 do build things in SVG commonly.
If you’re not familiar with SVG, it’s a vector image format that’s XML based, so that means
all the components of the visualization are XML elements that sit in the DOM just like
HTML elements. And most properties can be styled through CSS style sheets, although
not all of them can be. And just to give you a little over view of where we’re going. So
I’m just going to give you a little bit of an example of the SVG basics that’s the best
practice, in my opinion, and then that’s going to lead us into a discussion of performance
problems, especially with SVG, and we’re going to talk about how to profile those.
And then I’m going to follow up with just a few tips for how to solve performance problems
good tools for working with large data sets in particular.
So I say that D3 plus SVG is the most commonly tool set. Why is that inspect it’s because
what D3 does is bind data, so you’ve got these it is little miniature JSON objects that I
created. Two elements in the DOM, so those are commonly elements like this circle element.
So you can use D3 with something like canvas, but, in my opinion, that should be sort of
the option of last resort because you’re actually losing a lot of this power of D3 if you use
canvas. Because canvas is a raster image format, and the only element you end up in the DOM
is the canvas itself, so you’re losing this whole power of binding data to DOM elements
and basically all the functionality in D3 is around manipulating those binding in various
ways. So once you’ve learned those interfaces in
D3 and it’s basically around three sections that you’ll hear about the enter update and
selections and programming around those selections. Once you’ve learned that core functionality,
there’s not much to learn with D3. D3 provides a lot of useful functionality for manipulating
data, but in a lot of ways, it’s actually a really elegant concise project. And there
just isn’t a whole lot more there. The real challenge in my experience is that
if you really want to become proficient at data visualization, you have to become proficient
at SVG. You have to really understand SVG. So the thing that I want to talk a little
bit about that’s really important with SVG is using group elements. So this is an element
in SVG that’s pretty much parallel to a DIV in HTML except for the difference that you
can get away with not using it. I think it would be pretty strange if you had a lot of
HTML that had zero D IVs. But there’s basically three good reasons to use this group element,
and that’s organization, just organizing your code, providing hooks for interaction, and
then also it has performance implications if you don’t use them.
So this is probably pretty tiny but all you really need to see here is that what we’ve
got is a SVG with a totally flat structure. You know, we’re just dumping stuff into this
SVG element, and this is actually the SVG that I hand coded for that diagram for you
about DOM elements. So this is just everything in that diagram all at one level. And so here
it’s not too hard to understand, but you can see that in a much bigger project, much bigger
than that little diagram, debugging something like this in your console is going to be,
you know, inspecting it to examine how things are layered, would be a challenge. One thing
with SVG is that order matters a lot, so you do have to inspect the console a lot, there’s
no Z index in SVG, so you have to put things into the SVG element in the order that you
want them layered. So here’s an example where we’re actually
using groups to organize things. And, in fact, each row of that diagram binding one data
to a DOM element is a group here. And what’s nice about this is that the groups actually
give us hooks for interactions pretty easily. And lots of different ways. Here I actually
did a native SVG animation, but you can also apply class to each group — wow. And do it
where, like, a lot of things there are probably 100 ways to accomplish the single task.
And now here’s the third reason to use groups. There’s actually performance implications
if you don’t. This is a little bit of a silly example so all I’m doing is I plotted 5,000
random data points in a scatter plot, and then I’m shifting them all by a certain number
of pixels and this gift of the example where I did this. If you shift after you’ve plotted
all those circles inside the group element and you’re shifting them by applying a transform
to the group, it’s basically negligible performance, you know, it happens in zero milliseconds,
one millisecond. If you do it the dumb way by plotting all the circles, selecting a transform
to all of them, that actually takes a little bit of time. Now, it’s still not noticeable
to the user because it’s, like, 12 milliseconds. But the point here is the more complex you’re
getting with all these things, all that stuff is going to add up.
And so that is leading me straight into this discussion of profiling performance and data
visualization. So just like any experience of a website or
a Web application, when you’re doing something with interactive data VIZ, obviously you want
it to be snappy and Chris. And smooth, you want your users to have a good experience
and not have any noticeable lag or desynchronization or stuttering. But sometimes that’s just not
what happens, especially in my experience, like, in the first iteration of a new project.
So I’m going to talk about how in the past I’ve figured out what the problem is that’s
causing some performance problems from the point of the user and how to solve them.
So these are kind of from the most obvious to maybe the least obvious. There are three
things that I do a lot to profile the performance of a data visualization. And that’s just putting
timers in my code, using the frame rate meter in Chromes and reporting a more detailed timeline
profile. Also I use Chrome’s developer tools for that, but I do believe the frame rate
meter is unique to Chrome. So timers are pretty obscene beginning technique
for profiling performance. But that’s not to say that it’s not useful to talk about
them a little bit I think. I almost always now wrap a timer around the main render method
in any visualization that I’m working on because the render method, which is where you’re actually
manipulating the DOM in respect to the data, it tends to be the most expensive part, and
especially, especially if you are starting by working with maybe a prototype data set,
a data set that actually not the real data set, and then you’re going to scale it later,
you should have a timer around that data method so then you can go to scale. I’ve been benefit
by that before. I also prefer to use the console dot time
and time methods. I don’t know if everybody knows about these. They’re very useful. You
don’t have to the to do the calculation of the elapsed time. Not every browser has them.
But the only other thing to say about timers is you have to know where to put them, if
you don’t know where your performance problem is, then this isn’t going to help you.
So one of the places just to get an idea where you might have performance problems is spying
on the frame rate of your visualization or of any Web application. This is something
that applies a little bit more broadly. So I’m not going to walk you through the steps
here, but this might be a little bit more useful, which is I didn’t gift it for you.
But an important thing to note is you actually have to have the developer tools open for
the frame rate meter to stay there. It will be in the upper right of the viewpoint. So
I often pop the developer tools out and minimize them if I actually don’t need to inspect stuff
and then leave it open. I leave it a lot while I’m developing. It’s a good thing to keep
an eye on, and it can be fun to surf the Web with the frame rate meter on ands what other
people’s websites are doing. So this is in example of, you know, having
the frame rate meter open to profile a certain interaction. So in this case, again, a little
bit silly. I put this button in there to invert the Y scale of this scatter plot, and there’s
actually a little gradually increasing delay on the interaction — or which data point
moves to make the animation. What I want to point out here is, like, this
is the frames per second that you’re seeing in this interaction. And this 30 to 60 here
is the range. And so that’s showing us that throughout this, none of these dips on the
timeline are going below 30 frames per second. And then here’s an example of the same interaction
but this time I plotted 5,000 data points in the scatter plot, and that has a considerable
negative impact on the performance as you can see. The frame rate has dropped into the
20s, the range we have now is five to 60 instead of 30 to 60. And another thing that I can
just point out here is that performance and data visualization is often a really simple
numbers game. If you’re plotting more things, if you have more elements in the DOM, things
are going to be slower. So this is straight from Chrome’s developer
tools. It’s just showing you what’s in that frame meter display. So there’s the current
frame rate, the ranges, I said before that’s the minimum to the maximum. And there’s this
cute little histogram that’s hard to read, but it’s useful, especially if you leave the
frame rate meter open for a long time, you can see over time where its weighted. And
then the timeline as you’re doing things. So this is my biggest caveat about using Chrome’s
frame rate meter. It is that reported number is a main running average. So it’s not the
— quite the right tool formally precise identification of problems. Basically it’s good for ball
parking. And getting a general idea of how, you know, if you leave it open for a long
time while you’re working, you get a really good idea of how well your website is performing
sort of between refreshes. Or you can set up tests and get a general idea of whether
this interaction is doing well or not. But it’s not going to pinpoint anything for you.
It’s just not good for that surgical precision as you might call it. So when I first started
using the frame rate meter, it was really good for me. It gave me this quantitative
performance that you already have sort of a intuitive grasp that can be helpful for
really finding problems. But I didn’t know what the numbers meant, so here’s my guide
for you on what the numbers really kind of mean. And basically I would say anything above
40 is just pretty good. Your users probably aren’t notifying any lag or stuttering because
any of those things are just rare or subtle. 30 to 40 is a really discerning user might
be noticing something, like, a little bit of lag or a little bit snappier or smoother.
And under 30 is where people are really going to start to notice. No one’s going to say
to you the frame rate of this is slow unless they have a background in animation or something.
But they might notice. They’ll describe it as sluggish or lagy. Not smooth.
All right. So now moving on to profiling with the timeline tool.
So this is the one where timers and the frame rate tool are not really good for finding
the code that’s responsible for your performance problem. So that’s where you really need this
tool, the timeline. So in this scatter plot, what we’ve been looking
at, this is the function I’ve been using to generate data, and I’m just going to show
you another version of it for comparison. So here’s the original version. Now, this
one, the generate data slow function is a — this is going to be our white whale. I
put this example in here so we can find a bottleneck using the timeline tool. And this
is inspired by real mistakes that I’ve made. One of the difficulties with dealing with
a lot of time series data is that you — different parts of your code might expect dates and
if your code base is complex, these different modules have different expectations, you end
up parsing and reparsing date times in many ways, and daytime parsing is usually pretty
expensive because it’s pretty complex. And so that can really end up causing problems.
So this particular example obviously is really silly because this is not a complex code base
at all. And the problem here just to really point it out is I’m creating a new date and
then immediately formatting it out to a iso format string, and then I’m pushing the data
into the array, I’m parsing again with date dot parse, which produces an integer, and
times, and getting the day of the week out. And this is actually a really slow way to
get the day of the week out by formatting it into a day of the week with moment.
So, again, here’s just a little demonstration of the workflow that I use to set up a timeline
profile in Chrome’s developer tools. Not going to walk you through it step by step. But I’ll
point out that — so I’ve set it up so you get it all in a frame view, which is in these
vertical bars, and there’s color coding on those vertical bars. Blue for loading, purple
And when you’re looking for bottlenecks in this timeline in the frame chart view, which
is the view I prefer, and we’ll look at one in detail here in a minute. You’ll want to
remember that width represents time, and that’s the dimension you’re really interested in.
The vertical nature of that frame chart view represents your call stack, but that doesn’t
really matter at all how long that is if you have a huge call stack as long as it’s not
taking up very much space in terms of width. So really you don’t want to get too concerned
with that. Basically I look for things that don’t make sense in terms of width. And another
thing to note here, and this is actually something that took me quite a while to track down when
I was first doing this, and then I’ve linked this stack overflow of answer here. These
slides are online. Is that sometimes you’ll see in these horizontals blocks is that that
confused but he at first because I thought it was one function that’s getting called
more than once, about it’s not significant when that splitting up happens. It has to
do how Chrome or another tool is, like, statistically stamping your kale stack to produce these
charts, and sometimes things get split up, and it’s not significant.
So here’s one example of one of these timeline profiles, and this is for the original data
generate function that’s not slow. And this is probably a little small, sorry for that.
But this part here — I’ll try to do both sides. Is the amount of time that it’s taking
to generate the data, and it’s about less than half of the blue box, which is for drawing
the data. Now, that makes sense. That’s not strange because as I said before, usually
the part where you’re actually rendering and manipulating the DOM is the most expensive
part of any data visualization rendering. So that makes sense. That’s fine. Now, here’s
what happens with the white whale that I put in for us to find. This generate data slow
function. Now the data generation, again, the black box is taking, like, three to four
times longer than the drawing. So what you would want to do here is look into the call
stack into these functions that are being called below inside that generate data slow.
And that can tell you what you see if you’re in the Chrome developer tools and you can
hover over things there, and it will tell you the name of the function. Or it will at
least tell you what module it’s fro. These are all calls from moment.JS because of some
of the silly ways that I did the times and asked it to reparse this date time.
All right. So that’s the three tools for profiling performance. And now I’ll briefly just talk
about a couple of general strategies and tools for getting over some of these performance
problems. Other than, you know, being intelligent about how you deal with your time series data.
If I was asked just everything that I’ve learned about making performance data interaction
visualizations into a statement, these would be it. The first one is if you have a choice
verse version, use the browser native version. And I have a specific example for you here
in a second. And the second thing is try to change the DOM just as little as possible
just because that’s the most expensive part of this whole process.
So I’ve spent a lot of time over the past couple of years working on horizontally scrolling
timeline in visual data, and there’s two strategies for implementing this kind of thing. One is
to render a really wide SVG that you’re only seeing part of at any given time. And then
you’re using the browser’s native controlling capability. Or you can render the SVG exactly
the size of the portion of the timeline that you want to view it at any given time and
attach mouse events and touch events and change what you’re actually rendering within that
little SVG as the user’s interacting with the display. So that’s your navigation. So
at first I thought that this second strategy made more sense because I thought the rendering
a really wide SVG strategy was kind of strange and counterintuitive. But after having performance
problems with that original implementation I did a lot of experimentation, and browsers
put a lot of into work things like image scrolling performance, societies really not a surprise
using the browser’s native scrolling abilities there is more performance.
And here’s a little bit of evidence for that, the frame rate was in the 30s, and this is
a toy example that’s not even as remotely complex as what I first actually working on
for my company. And then here’s a browser native version, which has a frame rate that’s
way up in the 50s. And the code in these toy examples it blocks.org,
if you don’t know about this. Blocks is a website that’s also contained by Mike, the
primary author of D3. And it’s built on top of GitHub gist, so you just put your GitHub
user name into the URL, and it renders things. So if you have an index HTML, it will render
it. So it’s a great place, and basically where the D3 community puts all of their code examples,
instead Codepen. So it’s a great place to look for D3, and the URL later on my blocks
page if you’re interested in seeing these examples.
So lastly any final tip is try to not touch the DOM as much as you can. And, again, this
is something that’s not particularly innovative, obviously Facebook’s react is a entire front-end
UI framework that’s built around minimizing DOM manipulation. But just because it’s common
just doesn’t mean it’s always easy to remember. So I just think about it a lot when I’m working,
ask myself how much am I changing the DOM and can I avoid it? So, for example, this
huge scatter plot there’s probably a different way by aggregating it or smoothing and there
will probably end in fewer elements modified. And if you’re coding in interaction that’s
stability constantly removing nodes from the DOM and adding new ones, which is basically
exactly what you’re doing when you’re doing a timeline that scrolls, is there a way that
you can recycle the nodes that you’re about to remove so that you can reuse them and basically
that means just changing the data that’s bound to them so that you’re not removing them and
then adding new runs. You’re just recycling the ones you already have.
And then my last tip is for deal with large data sets. There’s a couple great libraries
for this, cross over and poor over, these are both over source tools. Cross filter is
your first choice for filtering helper with D3, it’s written by some of the original authors
of D3, and it provides a real wealth of functionality for numerical queering.
And poor over is a little bit different in the original use case for it was client fast
search. So basically if that is your exact case, it might be a tool to look for first.
It’s basically sort of tag filters in many combinations kind of use case. So that would
be worth checking out. So just to give you a very brief overview,
my tips for powerful, data visualization are to learn SVG as much as you can. Profile as
you work. And strategize, let the browser do as much work as possible, touch the DOM
as little as possible, and use the helper libraries if your data is really big. Thanks.