1
votes

Summary

I want to pull out Year over Year stats in a Crossfilter-DC driven dashboard

Year over Year (YoY) Definition

2017 YoY is the total units in 2017 divided by the total units in 2016.

Details

I'm using DC.js (and therefore D3.js & Crossfilter) to create an interactive Dashboard that can also be used to change the data it's rendering.

I have data, that though wider (has ~6 other attributes in addition to date and quantity: size, color, etc...sales data), boils down to objects like:

[
 { date: 2017-12-7, quantity: 56,  color: blue  ...},
 { date: 2017-2-17, quantity: 104, color: red   ...},
 { date: 2016-12-7, quantity: 60,  color: red   ...},
 { date: 2016-4-15, quantity: 6,   color: blue  ...},
 { date: 2017-2-17, quantity: 10,  color: green ...},
 { date: 2016-12-7, quantity: 12,  color: green ...}
  ...
]

I'm displaying one rowchart per attribuet such that you can see the totals by color, size, etc. People would use each of these charts to be able to see the totals by that attribute and drill into the data by filtering by just a color, or a color and a size, or a size, etc. This setup is all (relatively) straight forward and kind of what DC is made for.

However, now I'd like to add some YoY stats such that I can show a barchart with x-axis as the years, and the y-axis as the YoY values (ex. YoY-2019 = Units-2019 / Units-2018). I'd also like to do the same by quarter and month such that I could see YoY Mar-2019 = Units-Mar-2019 / Units-Mar-2018 (and the same for quarter).

I have a year dimension and sum quantity

var yearDim = crossfilterObject.dimension(_ => _.date.getFullYear());
var quantityGroup = yearDim.group.reduceSum(_ => _.quantity);

I can't figure out how to do the Year over Year calc though in the nice, beautiful DC.js-way.

Attempted Solutions

Year+1

Add another dimension that's year + 1. I didn't' really get any further though because all I get out of it are two dimensions whose year groups I want to divide ... but am not sure how.

var yearPlusOneDim = crossfilterObject.dimension(_ => _.date.getFullYear() + 1);

Visually I can graph the two separately and I know, conceptually, what I want to do: which is divide the 2017 number in yearDim by the 2017 number in YearPlusOneDim (which, in reality, is the 2016 number). But "as a concept is as far as I got on this one.

Abandon DC Graphing

I could always use the yearDim's quantity group to get the array of values, which I could then feed into a normal D3.js graph.

var annualValues = quantityGroup.all();
console.log(annualValues);
// output = [{key: 2016, value: 78}, {key: 2017, value: 170}]
// example data from the limited rows listed above

But this feels like a hacky solution that's bound to fail and not benefit from all the rapid and dynamic DC updating.

2
I'm a little lost - are you just trying to extract one number, total units for 2017 / total units for 2016? Are you going to draw a chart? If there's no chart, there's definitely no need for dc.js. I mean, you could put that one number in a number display but that's not all that exciting is it. If you want to extra a few YoY numbers for a chart, I can help you with that.Gordon
If you want to *extract a few YoY numbers and put them into a chart, then a fake group could help and I'd be glad to help with that. But it seems kind of silly for one number, so I'm trying to understand what you are trying to do.Gordon
Thanks Gordon. You're right that I missed a few clarifying points. I currently have a DC bar graph for each attribute in my data set (color, size, etc.). The primary purpose is to show how our sales forecast breaks out by each of these. (I also have the annual numbers displayed so people can see how it breaks by year.)Matt
The reason I'm using DC is so people can quick answer questions like "...and now how's it look just for Black...and now what about Black in 2019 only...etc. All the things DC is great for. Now that I've accomplished that, I wanted to add some YoY stats so that people could see that the forecast is increasing X% in 2019, but if you zoom into Black-8GB, it's actually 3% down YOY.Matt
Sounds good, but specifically what do you want to display? An extra annotation on e.g. every bar in a bar chart? A single number that displays YoY for whatever is currently selected / filtered?Gordon

2 Answers

1
votes

I'd use a fake group, in order to solve this in one pass.

As @Ethan says, you could also use a value accessor, but then you'd have to look up the previous year each time a value is accessed - so you'd probably have to keep an extra table around. With a fake group, you only need this table in the body of your .all() function.

Here's a quick sketch of what the fake group might look like:

function yoy_group(group) {
    return {
        all: function() {
            // index all values by date
            var bydate = group.all().reduce(function(p, kv) {
                p[kv.key.getTime()] = kv.value;
                return p;
            }, {});
            // for any key/value pair which had a value one year earlier,
            // produce a new pair with the ratio between this year and last
            return group.all().reduce(function(p, kv) {
                var date = d3.timeYear.offset(kv.key, -1);
                if(bydate[date.getTime()])
                    p.push({key: kv.key, value: kv.value / bydate[date.getTime()]});
                return p;
            }, []);
        }
    };
}

The idea is simple: first index all the values by date. Then when producing the array of key/value pairs, look each one up to see if it had a value one year earlier. If so, push a pair to the result (otherwise drop it).

This should work for any date-keyed group where the dates have been rounded.

Note the use of Array.reduce in a couple of places. This is the spiritual ancestor of crossfilter's group.reduce - it takes a function which has the same signature as the reduce-add function, and an initial value (not a function) and produces a single value. Instead of reacting to changes like the crossfilter one does, it just loops over the array once. It's useful when you want to produce an object from an array, or produce an array of different size from the original.

Also, when indexing an object by a date, I use Date.getTime() to fetch the numeric representation of the date. Otherwise the date coerces to a string representation which may not be exact. Probably for this application it would be okay to skip .getTime() but I'm in the habit of always comparing dates exactly.

Demo fiddle of YOY trade volume in the data set used by the stock example on the main dc.js page.

0
votes

I've rewritten @Gordon 's code below. All the credit is his for the solution (answered above) and I've just wirtten down my own version (far longer and likely only useful for beginners like me) of the code (much more verbose!) and the explanation (also much more verbose) to replicate my thinking in bridging my near-nothing starting point up to @Gordon 's really clever answer.

yoyGroup = function(group) {
  return { all: function() {
    // For every key-value pair in the group, iterate across it, indexing it by it's time-value
    var valuesByDate = group.all().reduce(function(outputArray, thisKeyValuePair) {
      outputArray[thisKeyValuePair.key.getTime()] = thisKeyValuePair.value;
      return outputArray;
    }, []);
    return group.all().reduce(function(newAllArray, thisKeyValuePair) {
        var dateLastYear = d3.timeYear.offset(thisKeyValuePair.key, -1);
        if (valuesByDate[dateLastYear.getTime()]) {
          newAllArray.push({
              key: thisKeyValuePair.key, 
            value: thisKeyValuePair.value / valuesByDate[dateLastYear.getTime()] - 1
          });
        }
        return newAllArray;
      }, []); // closing reduce() and a function(...)
  }}; // closing the return object & a function
};

¿Why are we overwritting the all() function? When DC.js goes to create a graph based on a grouping, the only function from Crossfilter it uses is the all() function. So if we want to do something custom to a grouping to affect a DC graph, we only have to overwrite that one function: all().

¿What does the all() function need to return? A group's all function must return an array of objects and each object must have two properties: key & value.

¿So what exactly are we doing here? We're starting with an existing group which shows some values over time (Important Assumption: keys are date objects) and then creating a wrapper around it so that we can take advantage of the work that crossfilter has already done to aggregate at a certain level (ex. year, month, etc.).

We start by using reduce to manipulate the array of objects into a more simple array where the keys and values that were in the objects are now directly in the array. We do this to make it easier to look up values by keys.

before / output structure of group.all()
[ {key: k1, value: v1},
  {key: k2, value: v2},
  {key: k3, value: v3}
]

after
[ k1: v1,
  k2: v2,
  k3: v3
]

Then we move on to creating the correct all() structure again: an array of objects each of which has a key & value property. We start with the existing group's all() array (once again), but this time we have the advantage of our valuesByDate array which will make it easy to look up other dates.

So we iterate (via reduce) over the original group.all() output and lookup in the array we generated earlier (valuesByDate), if there's an entry from one year ago (valuesByDate[dateLastYear.getTime()]). (We use getTime() so it's simple integers rather than objects we're indexing off of.) If there is an element of the array from one year ago, then we add a key-value object-pair to our soon-to-be-returned array with the current key (date) and for the value we divide the "now" value (thisKeyValuePair.value) by the value 1 year ago: valuesByDate[dateLastYear.getTime()]. Lastly we subtract 1 so that it's (the most traditional definition of) YoY. Ex. This year = 110 and last year = 100 ... YoY = +10% = 110/100 - 1.