8
votes

I have a group of graphs visualizing a bunch of data for me (here), based off a csv with approximately 25,000 lines of data, each having 12 parameters. However, doing any interaction (such as selecting a range with the brush on any of the graphs) is slow and unwieldy, completely unlike the dc.js demo found here, which deals with thousands of records as well but maintains smooth animations, or crossfilter's demo here which has 10 times as many records (flights) as I do.

I know the main resource hogs are the two line charts, since they have data points every 15 minutes for about 8 solid months. Removing either of them makes the charts responsive again, but they're the main feature of the visualizations, so is there any way I can make them show less fine-grained data?

The code for the two line graphs specifically is below:

        var lineZoomGraph = dc.lineChart("#chart-line-zoom")
            .width(1100)
            .height(60)
            .margins({top: 0, right: 50, bottom: 20, left: 40})
            .dimension(dateDim)
            .group(tempGroup)
            .x(d3.time.scale().domain([minDate,maxDate]));

        var tempLineGraph = dc.lineChart("#chart-line-tempPer15Min")
            .width(1100).height(240)
            .dimension(dateDim)
            .group(tempGroup)
            .mouseZoomable(true)
            .rangeChart(lineZoomGraph)
            .brushOn(false)
            .x(d3.time.scale().domain([minDate,maxDate])); 

Separate but relevant question; how do I modify the y-axis on the line charts? By default they don't encompass the highest and lowest values found in the dataset, which seems odd.

Edit: some code I wrote to try to solve the problem:

var graphWidth = 1100;
var dataPerPixel = data.length / graphWidth;

var tempGroup = dateDim.group().reduceSum(function(d) {
    if (d.pointNumber % Math.ceil(dataPerPixel) === 0) {
        return d.warmth;
    }
});

d.pointNumber is a unique point ID for each data point, cumulative from 0 to 22 thousand ish. Now however the line graph shows up blank. I checked the group's data using tempGroup.all() and now every 21st data point has a temperature value, but all the others have NaN. I haven't succeeded in reducing the group size at all; it's still at 22 thousand or so. I wonder if this is the right approach...

Edit 2: found a different approach. I create the tempGroup normally but then create another group which filters the existing tempGroup even more.

var tempGroup = dateDim.group().reduceSum(function(d) { return d.warmth; });
    var filteredTempGroup = {
        all: function () {
            return tempGroup.top(Infinity).filter( function (d) { 
                if (d.pointNumber % Math.ceil(dataPerPixel) === 0) return d.value;
            } );
        }
    };

The problem I have here is that d.pointNumber isn't accessible so I can't tell if it's the Nth data point (or a multiple of that). If I assign it to a var it'll just be a fixed value anyway, so I'm not sure how to get around that...

1
The "fake group" approach in your second edit seems reasonable. Since your data is probably in date order anyway (?), the index should be pretty much the same as the pointNumber, so adding a parameter to your filter callback function should give you an index you can use: .filter( function (d, i) { return (i % Math.ceil(dataPerPixel) === 0); } ). Also note that the filter callback function should return a boolean not a value. - Gordon
OK, so that works, kind of. I get a much more manageable 1071 results, but the results are also out of order, which confuses me. If you look at the live website now you'll see what I mean. The group's objects start off correctly at the first few data points, then jump ahead a few days and then jump back... so the points are fine, just somehow disordered. - IronWaffleMan
Um, yeah, you probably want to use .all() instead of .top(Infinity), for obvious reasons. Missed that. - Gordon
That did the trick, excellent :) What exactly is the reason behind that? Furthermore, I'd like the resolution change to adapt to the zoom level because right now when I zoom in there's not enough points and it looks a bit blocky... Is there a way to know how many data points are currently being shown in the graph at any time/zoom level? - IronWaffleMan
.all() is sorted on the key, and .top() on the value. You yourself are defining the number of data points that dc.js sees, but you might use chart.x().range() and the number of observations per unit of time to figure out how many data points there are to sample from. - Gordon

1 Answers

4
votes

When dealing with performance problems with d3-based charts, the usual culprit is the number of DOM elements, not the size of the data. Notice the crossfilter demo has lots of rows of data, but only a couple hundred bars.

It looks like you might be attempting to plot all the points instead of aggregating them. I guess since you are doing a time series it may be unintuitive to aggregate the points, but consider that your plot can only display 1100 points (the width), so it is pointless to overwork the SVG engine plotting 25,000.

I'd suggest bringing it down to somewhere between 100-1000 bins, e.g. by averaging each day:

var daysDim = data.dimension(function(d) { return d3.time.day(d.time); });

function reduceAddAvg(attr) {
  return function(p,v) {
    if (_.isLegitNumber(v[attr])) {
      ++p.count
      p.sums += v[attr];
      p.averages = (p.count === 0) ? 0 : p.sums/p.count; // gaurd against dividing by zero
    }
    return p;
  };
}
function reduceRemoveAvg(attr) {
  return function(p,v) {
    if (_.isLegitNumber(v[attr])) {
      --p.count
      p.sums -= v[attr];
      p.averages = (p.count === 0) ? 0 : p.sums/p.count;
    }
    return p;
  };
}
function reduceInitAvg() {
  return {count:0, sums:0, averages:0};
}
...
// average a parameter (column) named "param" 
var daysGroup = dim.group().reduce(reduceAddAvg('param'), reduceRemoveAvg('param'), reduceInitAvg);

(reusable average reduce functions from the FAQ)

Then specify your xUnits to match, and use elasticY to auto-calculate the y axis:

chart.xUnits(d3.time.days)
   .elasticY(true)