1
votes

Quite an oddly specific question here but something I've been having a lot of trouble with over the past day or so. Broadly, I'm trying to calculate the maximum of an array using crossfilter and then use this value to find a maximum.

For example, I have a series of Timestamps with an associated X Value and a Y Value. I want to aggregate the Timestamps by day and find the maximum X Value and then report the Y Value associated with this Timestamp. In essence this is a double dimension as I understand it.

I'm able to do the first stage simply to find the maximum values. But am having a lot of difficulty getting through to the second value.

Working code for the first, (using Crossfilter and Reductio). Assuming that each row has the following four values.

[(Timestamp,           Date,       XValue, YValue),
 (2015-05-15 16:00:00, 2015-05-15, 30,      15),
 (2015-05-15 16:45:00, 2015-05-15, 25,      33)
 ... (many thousand of rows)]

First Dimension

ndx = crossfilter(data);
dailyDimension = ndx.dimension(function(d) { return d.date; });

Get the max of the X Value using reductio

maxXValue = reductio().max(function(d) { return d.XValue;});
XValues = maxXValue(dailyDimension.group())

XValues now contains all of the maximum X Values on a Daily Basis.

I would now like to use these X Values to identify the corresponding Y Values on a date basis.

Using the same data above the appropriate value returned would be:

[(date,          YValue),
  ('2015-05-15', 15)] 
// Note, that it is 15 as it is the max X Value we find, not the max Y Value.

In Python/Pandas I would set the index of a DataFrame to X and then do an index match to find the Y Values

(Note, it can safely be assumed that the X Values are unique in this case but in reality we should really identify the Timestamp linked to this period and then match on that as they are strictly guaranteed to be unique, not loosely).

I believe this can be accomplished by modifying the reductio maximum code which I don't fully understand properly Source Code is from here

var reductio_max = {
add: function (prior, path) {
    return function (p, v) {
        if(prior) prior(p, v);

        path(p).max = path(p).valueList[path(p).valueList.length - 1];

        return p;
    };
},
remove: function (prior, path) {
    return function (p, v) {
        if(prior) prior(p, v);

        // Check for undefined.
        if(path(p).valueList.length === 0) {
            path(p).max = undefined;
            return p;
        }

        path(p).max = path(p).valueList[path(p).valueList.length - 1];

        return p;
    };
},
initial: function (prior, path) {
    return function (p) {
        p = prior(p);
        path(p).max = undefined;
        return p;
    };
}
};

Perhaps this can be modified so that there is a second valueList of Y Values which maps 1:1 with the X Values associated in the max function. In that case it would be the same index look up of both in the functions and could be assigned simply.

My apologies that I don't have any more working code.

An alternative approach would be to use some form of Filtering Function to remove entries which don't satisfy the X Criteria and then group by day (there should only be one value in this setting so a simple reduceSum for example will still return the correct value).

// Pseudo non working code
dailyDimension.filter(function(p) {return p.XValue === XValues;})
dailyDimension.group().reduceSum(function(d) {return d.YValue;})

Eventual results will be plotted in dc.js

1

1 Answers

3
votes

Not sure if this will work, but maybe give it a try:

maxXValue = reductio()
  .valueList(function(d) { 
    return ("0000000000" + d.XValue).slice(-10) + ',' + d.YValue;
  })
  .aliasProp({
    max: function(g) {
      return +(g.valueList[g.valueList.length - 1].split(',')[0]);
    },
    yValue: function(g) {
      return +(g.valueList[g.valueList.length - 1].split(',')[1]);
    }
  });
XValues = maxXValue(dailyDimension.group())

This is kind of a less efficient and less safe re-implementation of the maximum calculation using the aliasProp option, which let's you do pretty much whatever you want to to a group on every record addition and removal.

My untested assumption here is that the undocumented valueList function that is used internally in max/min/median will properly order. Might be easier/better to write a Crossfilter maximum aggregation and then modify it to also add the y-value to the group.

If you want to work through this with Reductio, I'm happy to do that with you here, but it will be easier if we have a working example on something like JSFiddle.