7
votes

I have a Solr document like this, where all the fields are mapped as a single document.

<doc>
    <int name="Id">7</int>
    <str name="Name">PersonName</str>
    <str name="Address">Address Line 1, Address Line 2, City</str>
    <str name="Country">India</str>
    <str name="ImageURL">0000028415.jpeg</str>
    <arr name="Category">
      <str>Student</str>
      <str>Group A</str>
    </arr>
</doc>

We would like to normalize it and have separate doc type for Person, Country and Category.

<doc>
    <int name="PId">7</int>
    <str name="Name">PersonName</str>
    <str name="Address">Address Line 1, Address Line 2, City</str>
    <str name="CountryId">91</str>
    <str name="ImageURL">0000028415.jpeg</str>
    <arr name="CategoryId">
      <str>2</str>
      <str>5</str>
    </arr>
</doc>



    <doc>
        <int name="CId">91</int>
        <str name="CountryName">India</str>
    </doc>



<doc>
        <int name="CatId">2</int>
        <str name="CategoryName">Student</str>
    </doc>

Note that I am just simplifying the example, actual document that I work with is too much complex than this, and we have millions of documents in the index.

I would like to understand, how to join and do filter query with this kind of document structure. And how does it impact performance compared to previous case, where all details are stored in single doc structure.

Update

Sample query with current structure, hope this helps with some idea on how it is done currently:

Here is the sample query for search with certain facets applied -

/select?indent=on&wt=json&facet.field={!ex%3DCategory}Category&facet.field=Manufacturer&facet.field=Vendor&facet.field=f_Hardrive&facet.field=f_Operating%2BSystem&facet.field=f_Memory&facet.field=f_CPU%2BType&facet.field=f_Screensize&facet.field=pa_OS&bf=&start=0&fq={!tag%3DCategory}Category:Notebooks&fq=Price:[0+TO+9999999999999]&rows=6&version=2.2&bq=&facet.query=AverageRating:[4+TO+5]&facet.query=AverageRating:[3+TO+5]&facet.query=AverageRating:[2+TO+5]&facet.query=AverageRating:[1+TO+5]&q=(laptop)&defType=edismax&spellcheck.q=(laptop)&qf=Name^7++ShortDescription^6++FullDescription^4+CategoryCopy^2+ManufacturerCopy^2+Sku^3+ChildSku^3+nGramContent+Attributes+ProductAttributes+Tag+ManufacturerPartNumber+CustomProperties&spellcheck=true&stats=true&facet.mincount=1&facet=true&spellcheck.collate=true&stats.field=Price

This filter query with facets:

select?indent=on&wt=json&facet.field=f_Hardrive&facet.field=f_Operating%2BSystem&facet.field=f_Memory&facet.field=f_CPU%2BType&facet.field={!ex%3Df_Screensize}f_Screensize&facet.field=pa_HDD&facet.field=pa_OS&facet.field={!ex%3Dpa_OS}pa_OS&facet.field=pa_OS&facet.field=pa_Processor&facet.field=pa_RAM&facet.field=pa_Software&facet.field=Vendor&facet.field={!ex%3DManufacturer}Manufacturer&facet.field=Category&start=0&fq=StockAvailability:(true)&fq={!tag%3Df_Screensize}f_Screensize:15.0%2527%2527\!!4!!&fq={!tag%3Dpa_OS}pa_OS:Apple\!!0!!&fq={!tag%3DPrice}Price:[594+TO+1800]&sort=CDO_1+asc&rows=6&version=2.2&facet.query=AverageRating:[4+TO+5]&facet.query=AverageRating:[3+TO+5]&facet.query=AverageRating:[2+TO+5]&facet.query=AverageRating:[1+TO+5]&q=CategoryID:(1+OR+2+OR+3+OR+4)&defType=edismax&spellcheck=true&stats=true&facet.mincount=1&facet=true&spellcheck.collate=true&stats.field=Price
1
I bet you do have good reasons to do this, would you share them? Why are you trying this?cheffe
Yes, currently we index data that are prepared after performing joining at indexing time. Instead if we index it this way, our indexing and updates will be much faster.Krunal
@Krunal, A couple of questions: 1. What version of solr are you planning to use? 2. Can you share a sample query? One which you are doing now with the current schema (not normalized doc) ?jay
I am planning to use latest version of Solr 6+. Will share query tomorrowKrunal
@jay Hi, I have added query into the questions. Pls review and suggest.Krunal

1 Answers

0
votes

The only thing that comes to my mind is using an XSLTResponseWriter to modify the query response with an XSLT file that transforms that response in a more adequate one.

Don't know if thats what you wanted.

EDIT: I will add more info about this.

So XSLT allows you transform an XML file into another (or anothers). You can swap the place of your tags, create new ones, combine them, take info from other XMLs and use it in the file you want to transform, etc. You can find more info about this here: https://www.w3schools.com/xml/xsl_intro.asp

Solr allows you to apply an XSLT tranformation in query time, to your query result. You just have to create your .xsl file and place it in mySolrCollection/conf/xslt/ dicrectory (create xslt/ if it doesn't exist). For example: mySolrCollection/conf/xslt/transformation.xsl

This file (transformation.xsl) will contain all transformations you want to apply to the query response. Im not going to go into how to write this transformations, it's not that hard to learn so you can just check the web for examples and for tutorials ;)

The last thing to do is to tell Solr that you want to apply a transformation to the response of the query, and you must do that by changing the query syntax. You must add the &wt=xslt&tr=transformation.xsl parts to your query to tell Solr that you want to apply a transformation to the response and that that transformation is defined in transformation.xsl

An example of a query should be:

http://<your_host>:<your_port>/solr/"your_collection"/select?q=*:*&wt=xslt&tr=tranformation.xsl&rows=100&...

If your query is correct, you will have your response transformed as you specified in your .xsl file.

Hope this in enough.