1
votes

I'm looking at solr as the search engine for collections of documents where we don't know the types of the data items in the documents in advance. Is this possible? OK, that's probably clear as mud so here's an example.

A user can create a document type dynamically. So they might create a document type for people that has fields name (a text string), age (a non-negative number) and gender (a boolean). Another user might create another document type for cars with make (a text string), enginesize (a non-negative number) and neworused (a boolean).

We can handle this using whoosh (a python search engine) by creating a separate whoosh schema for each document type, so we'd have a schema for the first document type specifying the fields that are to be indexed and the corresponding whoosh data types (and we can destroy the schema later when it is no longer needed).

Can I do something like this with solr? BTW, changing schema.xml to add new field types is not an option: the document types are completely dynamic, their fields may change after creation, and there may be thousands and thousands of them.

Hope this makes sense! It might be totally trivial, so please accept apologies from a Solr noob.

2

2 Answers

1
votes

This has been supported for very long time in Solr with dynamic fields. In fact, if you look at the examples (e.g. techproducts), you will see _s, _ss, etc dynamic field definitions.

So, you just name your fields with suffixes (or prefixes) to indicate the type and it just works.

The next problem would be which fields you search. In example schemas it is done by copying all of those to generic field and using that, but it is less flexible.

You may want instead to use eDisMax and specify field list explicitly. Or, use - recent - Config API to dynamically save those field lists.

0
votes

If I were in your place, I might I have used solr's Schemaless Mode. In this, you need not provide complete schema for the documents except an id (optional) and version fields. You need to use ManagedIndexSchemaFactory as schemaFactory in the solrconfig.xml. This will keep on adding the fields to the schema.xml as the document get indexed. You need to include additional update chains in your /update requestHandler in solrconfig.xml. Use the below sources for more information.

Schemaless Mode
Using Solr in a schemaless mode