3
votes

The Scenario:

I have the following (simplified) database table scenario:

ID   ProductName          ProductCategory   Colour   Price
----------------------------------------------------------
1    BatmanTShirt         T-Shirt           Black    22
2    BatmanTShirt         T-Shirt           Blue     20
3    SupermanTShirt       T-Shirt           Blue     19
4    SpidermanTrousers    Trousers          Red      28
5    SpidermanTrousers    Trousers          Black    30

My Wish:

In SOLR index, I would like this data to be mapped in a normalized way such that only 3 SOLR documents (as shown below) would be created instead of 5.

<doc1>
  <ID>1</ID>
  <ProductName>BatmanTShirt</ProductName>
  <ProductCategory>T-Shirt</ProductCategory>
  <OtherDetails>{ {1, Black, 22}, {2, Blue, 20} }</OtherDetails>
</doc1>
<doc2>
  <ID>3</ID>
  <ProductName>SupermanTShirt</ProductName>
  <ProductCategory>T-Shirt</ProductCategory>
  <OtherDetails>{ {3, Blue, 19} }</OtherDetails>
</doc2>
<doc3>
  <ID>4</ID>
  <ProductName>SpidermanTrousers</ProductName>
  <ProductCategory>Trousers</ProductCategory>
  <OtherDetails>{ {4, Red, 28}, {5, black, 30} }</OtherDetails>
</doc3>

Some Notes:

  • <ID> will contain the minimum ID from the group
  • <OtherDetails> will contain the unique ID plus the other details that are left out when grouping. This would be a multi-valued field with data type of List holding another List of details {ID, Colour, Price}.

Question:

Anyone knows how is this possible?

P.S.

The reason for doing this 'grouping' move is that I want to facet on the ProductCategory. If I use faceting on ProductCategory, currently the counts generated will be:

T-Shirt (3)
Trousers (2)

Now what I want is to facet on the ProductCategory without Colour and Price data such that I want to have only 2 T-Shirts (one of Batman and one of Superman) and only 1 Trousers (Spiderman's). Therefore what I want to show is this:

T-Shirt (2)
Trousers (1)

I did some research and found out that this feature (which is called Post-Group Faceting or Matrix counts) is currently WIP, as noted in this SOLR patch. So I want a temporary workaround since this may take a while to finish.

1

1 Answers

1
votes

The patch works fine for single valued fields, so using this patch and grouping is the best way to go.

Just index the data like it is in the database, so you don't need to use multi-value fields.

You can download the latest code with TortoiseSVN and apply patch. Building WAR (or JAR's) is very easy in Eclipse. Just start new project with the code you just downloaded and run the ant scripts in the build.xml in the root and solr directory.