0
votes

I am trying to create a variable for putting in dc.js using a custom reduction (reduceAdd, reduceRemove etc) and am having trouble figuring out how to code it.

I wrote the function outside of these reduce functions and have to now replicate the same inside reduce functions in order to use the same for the graphs plotted. The logic and code written for outside reduce functions are as follows

Logic : For each unique contact_week available (dates), find max value of week_number,then sum up TOTCOUNT variable and DECAY_CNT variable and calculate percentage (DECAY_CNT/ TOTCOUNT) .

Here is the original code without using crossfilter:

 //Decay % logic   
  var dates = d3.map(filter1,function(d) { return d.CONTACT_WEEK;}).keys() ;
  console.log(dates);
  var sum1,sum2 = 0;


  for(var i=0; i<dates.length; i++)
    {
      data1 = filter1.filter(function(d) { return d.CONTACT_WEEK == dates[i] ;});
      //console.log(data1);
      var max = d3.max(data1, function(d) { return +d.WEEK_NUMBER ;});
      //console.log(max);
      data2 = data1.filter(function(d) { return d.WEEK_NUMBER == max ;});

      var sum1 = d3.sum(data2, function(d) { return d.TOTCOUNT ;});
      var sum2 = d3.sum(data2, function(d) { return d.DECAY_CNT ;});
      console.log(sum1);
      var decay = sum2/sum1 * 100 ;
      console.log(decay); 

    } 

The first step in this is to identify unique values of dates (contact_week) - How do I go about doing this in the reduce functions as it's already a for loop that traverses through the data?

I guess for max etc, we can use reductio or some other logic as mentioned in comments, but I'm not really getting the approach/design to be followed here

Any help in approach/solutions will be highly appreciated.

UPDATE2 :

Trying a new approach using reductio js

Data explanation :

A few columns in my data - contact_week (dates) ; week_number (numbers - -4 to 6) ; decay_cnt (integers) ; totcount (integers) ; duration (ordinal values - pre, during and post) ;

Now, I need to calculate a percentage called decay %, which is calculated as follows: For each unique contact_week, find max of week_number, now for this filtered dataset, calculate sum (decay_cnt) / sum (totcount)

This has to be plotted in a barchart where the x-axis is duration and the metric - decay % is y axis

In pursuit of calculating the max of week-numbers of individual dates, I've plotted a bar chart for now, with contact_week as x-axis and max of week_number as the y-axis. How do I get the chart that I need?

Code :

dateDimension2  = ndx.dimension(function(d) {return d.CONTACT_WEEK ;});
decayGroup = reductio().max(function (d) { return d.WEEK_NUMBER; })(dateDimension2.group());


chart2
    .width(500)
    .height(200)
    .x(d3.scale.ordinal())
    //.x(d3.scale.ordinal().domain(["DURING","POST1"]))
    .xUnits(dc.units.ordinal)
    //.xUnits(function(){return 10;})
    //.brushOn(false)
    .yAxisLabel("Decay (in %)")
    .dimension(dateDimension)
    .group(decayGroup)
    .gap(10)
    .elasticY(true)
    //.yAxis().tickValues([0, 5, 10, 15])
    //.title(function(d) { return d.key + ": " + d3.round(d.value.new_count,2); })
    /*.valueAccessor(function (p) {
    //return p.value.count > 0 ? (p.value.dec_total / p.value.new_count) * 100  : 0;
    return p.value.decay ;
    })*/
    .valueAccessor(function(d) { return d.value.max; })
    .on('renderlet', function(chart) {
        chart.selectAll('rect').on("click", function(d) {
            console.log("click!", d);
        });
    })
    .yAxis().ticks(5);

Any approach/suggestions will be highly appreciated

I think the solution mostly lies in the fake groups/dimensions and reduction js combined approach. Any alternatives are most welcome!

1
Please show the reduceAdd/reduceRemove code that is causing the error. The best thing to do would be to create a working example of the issue at jsfiddle or a similar site.Ethan Jewett
The good news is the main loop and the first filter should be handled by the crossfilter groups automatically. The bad news is that crossfilter doesn't have min/max type stuff built in, and they're difficult to do efficiently. You end up storing some reference to the rows in each bin. If you search around you'll find various codes for this, e.g. stackoverflow.com/a/32925852/676195 - IIUC your problem is just a more complicated reduce on the same kind of data.Gordon
reductio also has min/max stuff but I don't know if it supports filtering the rows within the bin and reducing on them. Interesting design challenge, @Ethan!Gordon
@Gordon You can almost do this with Reductio, but I think not quite. It supports the filter predicates and min/max calculation, but filtering on min/max is the problem. It's just not clear to me exactly what the filtering on min/max means semantically. A working example would probably clarify.Ethan Jewett
@EthanJewett : Used reductio js now and updated the question. Do have a look whenever freePravin Singh

1 Answers

3
votes

I've just added a FAQ and an example for this kind of problem.

As explained there, the idea is to maintain an array of rows which fall into each bin, since crossfilter doesn't provide access to that yet. Once we've got the actual rows, your calculations are almost the same as you are doing now, except that crossfilter keeps track of the list of weeks for you.

So you can use these functions from the example:

  function groupArrayAdd(keyfn) {
      var bisect = d3.bisector(keyfn);
      return function(elements, item) {
          var pos = bisect.right(elements, keyfn(item));
          elements.splice(pos, 0, item);
          return elements;
      };
  }

  function groupArrayRemove(keyfn) {
      var bisect = d3.bisector(keyfn);
      return function(elements, item) {
          var pos = bisect.left(elements, keyfn(item));
          if(keyfn(elements[pos])===keyfn(item))
              elements.splice(pos, 1);
          return elements;
      };
  }

  function groupArrayInit() {
      return [];
  }

You need to have a unique key in your records so that they can be added and removed reliably. I'll assume that your records have an ID field.

Define your week dimension and group like so:

var weekDimension = ndx.dimension(function(d) {return d.CONTACT_WEEK ;}),
    id_function = function(r) { return r.ID; },
    weekGroup = weekDimension.group().reduce(groupArrayAdd(id_function), groupArrayRemove(id_function), groupArrayInit);

Then the most efficient time to calculate your metric is when it's needed, in the value accessor. So you can define your value accessor with the heart of the code you posted in your question.

(Of course, this code is untested because I don't know your data.)

var calculateDecay = function(kv) {
    // kv.value has the array produced by the reduce functions.
    var data1 = kv.value;
    var max = d3.max(data1, function(d) { return +d.WEEK_NUMBER ;});
    data2 = data1.filter(function(d) { return d.WEEK_NUMBER == max ;});

    var sum1 = d3.sum(data2, function(d) { return d.TOTCOUNT ;});
    var sum2 = d3.sum(data2, function(d) { return d.DECAY_CNT ;});

    var decay = sum2/sum1 * 100 ;
    return decay;
}

chart.valueAccessor(calculateDecay);