3
votes

CrossFilter/JS newbie here.

This question pretty much describes exactly what I'm trying to do but there doesn't seem to be a solution using CrossFilter:

How to return the number of unique values by category using crossfilter?

I have data with

var va = [{
date: "2014-10-01",
id: "1"},
{
date: "2014-10-02",
id: "1"},
{
date: "2014-10-03",
id: "1"},
{
date: "2014-10-04",
id: "1"},
{
date: "2014-10-05",
id: "1"},
{
date: "2014-10-01",
id: "2"},
{
date: "2014-10-02",
id: "2"},
{
date: "2014-10-03",
id: "2"},
{
date: "2014-10-04",
id: "1"},
{
date: "2014-10-01",
id: "3"},
{
date: "2014-10-02",
id: "3"},
{
date: "2014-10-03",
id: "1"},
{
date: "2014-10-01",
id: "4"},
{
date: "2014-10-02",
id: "1"},
{
date: "2014-10-01",
id: "5"}
}

I am trying to get the number of unique id's per date from this. I would like to group by date and basically have a count of unique id's for that particular date:

"2014-10-01" - 5
"2014-10-02" - 3
"2014-10-03" - 2
"2014-10-04" - 1
"2014-10-05" - 1

Currently, I'm trying to follow the answer given in this question

Crossfilter reduce :: find number of uniques

to do the following:

//Create a Crossfilter instance
var ndx = crossfilter(va);

//Define dimensions
var date_dim = ndx.dimension(function(d) {
    return d["date"]; });

//total number of ids per date
var num_ids_by_date = date_dim.group();

//unique number of ids per date
var num_uniq_ids_by_date = date_dim
    .group()
    .reduce(
        function (p, d) {
            if(d.id in p.ids){
            }
            else{
                p.ids[d.id] = 1;
            }
            return p;
        },

        function (p, d) {
            p.ids[d.id]--;
            if(p.ids[d.id] === 0){
                delete p.ids[d.id];
            }
            return p;
        },

        function () {
            return {ids: {}};
        })

When I look in the num_uniq_ids_by_date object and call num_uniq_ids_by_date.reduceCount().top(1), it seems to be the same output as num_ids_by_date.top(1).

So, I still don't seem to be getting what I'm looking for and have been stumped for a while.

Any suggestions? Thanks in advance!

1
Seems like you aren't incrementing the counter on add, which will cause you problems. If you put together a working example, it will be easier to diagnose the issue. You could also use a library like Reductio, which supports this: github.com/esjewett/… (plugging my own library, sorry) - Ethan Jewett
Thanks for the response Ethan. The reason I don't increment the counter on add is because I don't entirely care about the amount of each particular id, I just would like the number of unique ids. Also, thanks for the library suggestion, I'll definitely check it out. If possible, I would like to keep it to just the CrossFilter library for now while I'm still learning :) - archeezee
If you don't increment on add, but you decrement on remove (which you're doing), you're going to get into an inconsistent state pretty fast. I didn't see your actual question though. Calling num_uniq_ids_by_date.reduceCount() wipes out all your custom group reducers. Just call num_uniq_ids_by_date.top(1). - Ethan Jewett
Oops, that was a mistake on my part - thanks for pointing it out. Thanks for the suggestions! I was actually able to get it. I'll be sure to add my answer. - archeezee

1 Answers

2
votes

Okay I was able to get it.

What I ended up doing is the following:

//Create a Crossfilter instance
var ndx = crossfilter(va);

//Define dimensions
var date_dim = ndx.dimension(function(d) {
    return d["date"]; });

var num_unique_ids_by_date = date_dim
    .group()
    .reduce(
        function (p, d) {
            if(d.id in p.ids){
                p.ids[d.id] += 1
            }
            else{
                p.ids[d.id] = 1;
                p.id_count++;
            }
            return p;
        },

        function (p, d) {
            p.ids[d.id]--;
            if(p.ids[d.id] === 0){
                delete p.ids[d.id];
                p.id_count--;
            }
            return p;
        },

        function () {
                return {ids: {},
                id_count: 0};
            });

This gives me a total number of unique id's as well as the total numer of occurences of each id.

Then when I want to display this in my bar graph using dc.js, I go ahead and use the following code.

var minDate = date_dim.bottom(1)[0]["date"];
var maxDate = date_dim.top(1)[0]["date"];

var timeChart = dc.barChart("#time-chart");

timeChart
    .width(1500)
    .height(400)
    .margins({top: 10, right: 50, bottom: 30, left: 50})
    .dimension(date_dim)
    .group(num_unique_ids_by_date)
    .valueAccessor(function (d) {
        return d.value.id_count;
    })
    .transitionDuration(500)
    .x(d3.time.scale().domain([minDate, maxDate]))
    .elasticY(true)
    .elasticX(true)
    .xAxisLabel("Year")
    .yAxis();

dc.renderAll();