4
votes

For example, say I have the following facet:

Colors

  • Red (7825)
  • Orange (2343)
  • Green (843)
  • Blue (5412)

In my database, colors would be a table and each color would have a primary key and a name/value.

When indexing with Solr/Lucene, in all of the examples I've seen, the value is indexed and not the primary key. So if I filter by the color red, I would get something like the following:

http://www.example.com/search?color=Red

I'm wondering, is it wise to instead index the primary key and retrieve the values from the database when displaying the facet values? So I would instead get something like this:

http://www.example.com/search?color=1

"1" representing the primary key of the color red. I'm wondering if I should take this approach since the values of many of my facets frequently change, but the primary keys stay the same. Also, the index is required to be in sync with the database.

Does anymore have any experience with this? How do you think this will affect performance?

Thanks in advance!

1
short answer: yes, it's ok. long answer: when I get home :-)Mauricio Scheffer

1 Answers

1
votes

If you expect your entities to change frequently, it's easier to index the ID's, and when you get your facet results, do a lookup in the database to get the names of the colors. That way changes to colors wouldn't require affected documents to be updated in the index.

In our system, we index the ID's Lucene instead of the name of the entities, exactly because of the reasons you stated. Also, our entities have a bunch of properties associated with them, which aren't indexed, so we would have to hit the database to get them anyway.

As far as performance goes, the faceting of ID's won't be discernibly slower or faster. As far as the database lookups go, it shouldn't be a big deal, especially if you're only pulling down tens of facets at a time. You can always use caching to speed that up if it becomes an issue.