1
votes

I'm refactoring a web application that makes heavy use of facet queries from a Solr back-end. In particular, these are multi-level nested facets: e.g faceting people on eye color then hair color (as in the code example below). The facet results should be counts not the actual records and their fields.

I'm interested in using GraphQL to decouple the front and back-ends, but I don't think I can get around the field-based resolution execution model. So what would be a single query in Solr, returning counts of people by eye color and then (within each eye color facet) by hair color, seems to have to be done by multiple queries in GraphQL.

I made a self-contained nodejs example with just two dependencies (graphql and graphql-tools) which demonstrates what I want to achieve. There's a facetize(data, fields) function that simulates what Solr can do. It does the facet counting on data for the fields given. If multiple fields are given, then it does nested faceting as described above.

I plugged this function into the resolver so that it can also be called with just one field at each required level. It works fine, but of course with multiple calls to facetize().

My question is, could the resolver somehow be set up so that only one query is made to facetize but still return the same nested facet data? I don't mind if the query is flattened (e.g. like facets(field1: "eyeColor", field2: "hairColor") { ... }) but I played a bit with this and the result only contains the first level of counts.

And... bonus question for the Solr people, would it perhaps be just as efficient to make multiple Solr queries for the sub-facets? Or perhaps even more efficient, if it can be parallelized by the GraphQL execution engine? Is it worth mocking something up at least to benchmark?

var { graphql, buildSchema } = require('graphql');
var { makeExecutableSchema } = require('graphql-tools');


// GraphQL schema
const typeDefs = `

type Query {
  facets(field: String!) : [Facet!]!
}

type Facet {
  value : String!
  field : String!
  count : Int!
  facets(field: String!) : [Facet!]!
}
`;


// demo data
const data = [
  { name: 'John',
    eyeColor: 'blue',
    hairColor: 'brown'
  },
  { name: 'Jane',
    eyeColor: 'blue',
    hairColor: 'blonde'
  },
  { name: 'Jay',
    eyeColor: 'green',
    hairColor: 'black'
  },
  { name: 'Julie',
    eyeColor: 'blue',
    hairColor: 'brown'
  },
  { name: 'Jamal',
    eyeColor: 'brown',
    hairColor: 'black'
  },
  { name: 'Jack',
    eyeColor: 'green',
    hairColor: 'blonde'
  },
  { name: 'Jill',
    eyeColor: 'blue',
    hairColor: 'brown'
  }
];


// this facetize() function is a simple recreation of
// the functionality that Solr can perform in just one request
// (keeping all the data Solr-server-side)
//
//   var facets = facetize(data, ['eyeColor', 'hairColor']);
//   console.log(JSON.stringify(facets, null, 2));
//   // though it would display cleaner with the data fields stripped out

function facetize(data, flds, message) {
  // log first-time entry to this recursive function
  if (message) console.log("entering facetize from "+message+" using fields "+flds);

  let fields = flds.slice();  // make a deep-ish copy
  let field = fields.shift(); // deal with the first field first

  // set up the empty facet objects for each unique value of field 'field'
  let uniqueValues = [...new Set(data.map(elem => elem[field]))];
  let facets = {};
  uniqueValues.forEach(value =>
               facets[value] =
               { value: value, field: field, count: 0, data: [] });

  // now iterate through data counting occurrences and
  // storing the data items in facet.data
  data.forEach( item => {
    let value = item[field];
    facets[value].count++;
    facets[value].data.push(item);
  });

  // if there are fields left to facet on, do so recursively
  if (fields.length) {
    uniqueValues.forEach(value =>
             facets[value].facets =
             facetize(facets[value].data, fields));
  }
  return uniqueValues.map( value => facets[value] );
}


// The root provides a resolver function for each API endpoint
const resolvers = {
  Query : {
    facets: (_, { field }) => facetize(data, [field], "Query.facets")
  },
  Facet : {
    facets: (obj, { field }) => facetize(obj.data, [field], "Facet.facets")
  }
};


// do some schema magic
const schema = makeExecutableSchema({ typeDefs, resolvers })


// Run the GraphQL query and print out the response
graphql(
  schema, `
 {
   facets(field: "eyeColor") { 
     value
     field
     count
     facets(field: "hairColor") {
       value
       field
       count
     }
   }
 }`
).then((response) => {
  console.log(JSON.stringify(response,null,2));
});

And a snippet of the output

entering facetize from Query.facets using fields eyeColor
entering facetize from Facet.facets using fields hairColor
entering facetize from Facet.facets using fields hairColor
entering facetize from Facet.facets using fields hairColor
{
  "data": {
    "facets": [
      {
        "value": "blue",
        "field": "eyeColor",
        "count": 4,
        "facets": [
          {
            "value": "brown",
            "field": "hairColor",
            "count": 3
          },
          {
            "value": "blonde",
            "field": "hairColor",
            "count": 1
          }
        ]
      },
      {
        "value": "green",
        "field": "eyeColor",
        "count": 2,
        "facets": [
          {
            "value": "black",
            "field": "hairColor",
            "count": 1
          },
          {
            "value": "blonde",
            "field": "hairColor",
            "count": 1
          }
        ]
      },
...
1

1 Answers

0
votes

What you're encountering is the N+1 problem. You should be able to solve this by using dataloader. Ben has a great tutorial here you can check out. He also includes other ways to solve the problem