This question arise from some difficulties in creating a crossfilter
dataset, in particular on how to group the different dimension and compute a derived values. The final aim is to have a number of dc.js
graphs using the dimensions and groups.
(Fiddle example https://jsfiddle.net/raino01r/0vjtqsjL/)
Question
Before going on with the explanation of the setting, the key question is the following:
How to create custom add
, remove
, init
, functions to pass in .reduce
so that the first two do not sum multiple times the same feature?
Data
Let's say I want to monitor the failure rate of a number of machines (just an example). I do this using different dimension: month, machine's location, and type of failure.
For example I have the data in the following form:
| month | room | failureType | failCount | machineCount |
|---------|------|-------------|-----------|--------------|
| 2015-01 | 1 | A | 10 | 5 |
| 2015-01 | 1 | B | 2 | 5 |
| 2015-01 | 2 | A | 0 | 3 |
| 2015-01 | 2 | B | 1 | 3 |
| 2015-02 | . | . | . | . |
Expected
For the three given dimensions, I should have:
- month_1_rate = $\frac{10+2+0+1}{5+3}$;
- room_1_rate = $\frac{10+2}{5}$;
- type_A_rate = $\frac{10+0}{5+3}$.
Idea
Essentially, what counts in this setting is the couple (day, room)
. I.e. given a day and a room there should be a rate attached to them (then the crossfilter should act to take in account the other filters).
Therefore, a way to go could be to store the couples that have already been used and do not sum machineCount
for them - however we still want to update the failCount
value.
Attempt (failing)
My attempt was to create custom reduce functions and not summing MachineCount
that were already taken into account.
However there are some unexpected behaviours. I'm sure this is not the way to go - so I hope to have some suggestion on this. // A dimension is one of: // ndx = crossfilter(data); // ndx.dimension(function(d){return d.month;}) // ndx.dimension(function(d){return d.room;}) // ndx.dimension(function(d){return d.failureType;}) // Goal: have a general way to get the group given the dimension:
function get_group(dim){
return dim.group().reduce(add_rate, remove_rate, initial_rate);
}
// month is given as datetime object
var monthNameFormat = d3.time.format("%Y-%m");
//
function check_done(p, v){
return p.done.indexOf(v.room+'_'+monthNameFormat(v.month))==-1;
}
// The three functions needed for the custom `.reduce` block.
function add_rate(p, v){
var index = check_done(p, v);
if (index) p.done.push(v.room+'_'+monthNameFormat(v.month));
var count_to_sum = (index)? v.machineCount:0;
p.mach_count += count_to_sum;
p.fail_count += v.failCount;
p.rate = (p.mach_count==0) ? 0 : p.fail_count*1000/p.mach_count;
return p;
}
function remove_rate(p, v){
var index = check_done(p, v);
var count_to_subtract = (index)? v.machineCount:0;
if (index) p.done.push(v.room+'_'+monthNameFormat(v.month));
p.mach_count -= count_to_subtract;
p.fail_count -= v.failCount;
p.rate = (p.mach_count==0) ? 0 : p.fail_count*1000/p.mach_count;
return p;
}
function initial_rate(){
return {rate: 0, mach_count:0, fail_count:0, done: new Array()};
}
Connection with dc.js
As mentioned, the previous code is needed to create dimension, group
to be passed in three different bar graphs using dc.js
.
Each graph will have .valueAccessor(function(d){return d.value.rate};)
.
See the jsfiddle (https://jsfiddle.net/raino01r/0vjtqsjL/), for an implementation. Different numbers, but the datastructure is the same. Notice the in the fiddle you expect a Machine count
to be 18 (in both months), however you always get the double (because of the 2 different locations).
Edit
Reduction + dc.js
Following Ethan Jewett answer, I used reductio
to take care of the grouping. The updated fiddle is here https://jsfiddle.net/raino01r/dpa3vv69/
My reducer
object needs two exception (month, room)
, when summing the machineCount
values. Hence it is built as follows:
var reducer = reductio()
reducer.value('mach_count')
.exception(function(d) { return d.room; })
.exception(function(d) { return d.month; })
.exceptionSum(function(d) { return d.machineCount; })
reducer.value('fail_count')
.sum(function(d) { return d.failCount; })
This seems to fix the numbers when the graphs are rendered.
However, I do have a strange behaviour when filtering one single month and looking at the numbers in the type
graph.
Rather double create two exception, I could merge the two fields when processing the data. I.e. as soon the data is defined I couls:
data.foreach(function(x){
x['room_month'] = x['room'] + '_' + x['month'];
})
Then the above reduction code should become:
var reducer = reductio()
reducer.value('mach_count')
.exception(function(d) { return d.room_month; })
.exceptionSum(function(d) { return d.machineCount; })
reducer.value('fail_count')
.sum(function(d) { return d.failCount; })
This solution seems to work. However I am not sure if this is a sensible things to do: if the dataset is large,adding a new feature could slow down things quite a lot!