1
votes

I have some questions regarding the solr schema design. Basically I'm setting up a search engine for product catalogue website and my table relationships are as follows.

  • Product Belongs to Merchant
  • Product Belongs to Brand
  • Product has and belongs to many Categories
  • Category has many Sub Categories
  • Sub Category has many Types
  • Type has many Sub Types

So far my Schema.xml is looks like this.

<field name="product_id" type="string" indexed="true" stored="true" required="true" /> 
<field name="name" type="string" indexed="true" stored="true"/>
<field name="merchant" type="string" indexed="true" stored="true"/>
<field name="merchant_id" type="string" indexed="true" stored="true"/>
<field name="brand" type="string" indexed="true" stored="true"/>
<field name="brand_id" type="string" indexed="true" stored="true"/>
<field name="categories" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="sub_categories" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="types" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="sub_types" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="price" type="float" indexed="true" stored="true"/>
<field name="description" type="text" indexed="true" stored="true"/>
<field name="image" type="text" indexed="true" stored="true"/>

<field name="text" type="text" indexed="true" stored="false" multiValued="true"/>

<uniqueKey>product_id</uniqueKey>

<defaultSearchField>text</defaultSearchField>

<solrQueryParser defaultOperator="OR"/>

<copyField source="name" dest="text"/>
<copyField source="merchant" dest="text"/>
<copyField source="brand" dest="text"/>
<copyField source="categories" dest="text"/>
<copyField source="sub_categories" dest="text"/>
<copyField source="types" dest="text"/>
<copyField source="sub_types" dest="text"/>

So my Questions now:

1) Is the Schema correct?

2) Let's assume I need to find products for Category XYZ. My Senior programer doesn't like querying the solr by Category Name, instead he wan't to use CategoryID. He is suggesting to store CategoryID_CategoryName (1001_Category XYZ) and from web front he is sending ID. (Assuming that Names with white spaces doesn't work properly).

So to find the products I should then do a partial match of categories and identify the category id from the string i.e (fetch 1001 from 1001_Category XYZ) or What if I keep the Names on categories field and setup another field for category_ids? that's seems a better option for me.

or

is there any Solr multi valued field type to store CategoryID and CategoryName together?

Let me know your thoughts, thanks.

1

1 Answers

3
votes

Answers to your questions.

  1. Maybe - it depends on how you plan on structuring your queries, what you intend to search and what you intend to retrieve in search results. In your schema, you're storing & indexing everything which can be quite inefficient. Index what you intend to query, store what you intend to retrieve/display. If you were looking for optimizations, I would review the datatypes used in the schema - try to stay as native to the source type as you can.
  2. Querying by CategoryId - your programmer is correct, you want to query by category Id. Your approach of storing Ids and Names in separate fields is accurate as well. Presuming your Id-based fields are integers/longs, you don't want to structure them as strings but rather as integers/longs.