The Scenario:
I have the following (simplified) database table scenario:
ID ProductName ProductCategory Colour Price
----------------------------------------------------------
1 BatmanTShirt T-Shirt Black 22
2 BatmanTShirt T-Shirt Blue 20
3 SupermanTShirt T-Shirt Blue 19
4 SpidermanTrousers Trousers Red 28
5 SpidermanTrousers Trousers Black 30
My Wish:
In SOLR index, I would like this data to be mapped in a normalized way such that only 3 SOLR documents (as shown below) would be created instead of 5.
<doc1>
<ID>1</ID>
<ProductName>BatmanTShirt</ProductName>
<ProductCategory>T-Shirt</ProductCategory>
<OtherDetails>{ {1, Black, 22}, {2, Blue, 20} }</OtherDetails>
</doc1>
<doc2>
<ID>3</ID>
<ProductName>SupermanTShirt</ProductName>
<ProductCategory>T-Shirt</ProductCategory>
<OtherDetails>{ {3, Blue, 19} }</OtherDetails>
</doc2>
<doc3>
<ID>4</ID>
<ProductName>SpidermanTrousers</ProductName>
<ProductCategory>Trousers</ProductCategory>
<OtherDetails>{ {4, Red, 28}, {5, black, 30} }</OtherDetails>
</doc3>
Some Notes:
<ID>
will contain the minimum ID from the group<OtherDetails>
will contain the unique ID plus the other details that are left out when grouping. This would be a multi-valued field with data type of List holding another List of details {ID, Colour, Price}.
Question:
Anyone knows how is this possible?
P.S.
The reason for doing this 'grouping' move is that I want to facet on the ProductCategory. If I use faceting on ProductCategory, currently the counts generated will be:
T-Shirt (3)
Trousers (2)
Now what I want is to facet on the ProductCategory without Colour and Price data such that I want to have only 2 T-Shirts (one of Batman and one of Superman) and only 1 Trousers (Spiderman's). Therefore what I want to show is this:
T-Shirt (2)
Trousers (1)
I did some research and found out that this feature (which is called Post-Group Faceting or Matrix counts) is currently WIP, as noted in this SOLR patch. So I want a temporary workaround since this may take a while to finish.