Suppose in addition of simple text terms i want to retrieve some complex data from text. For example, text can contain descriptions of graphs in some format. After that I want to do queries which contain some conditions on those graphs (for examle I want to find all documents with planar graphs or something like this). It seems that standard index of Solr is not sufficient for such a task because in the end it (as I understand) treats document in terms of tokens which are just strings, but I need additional index which have more suited format. So question is: can I somehow customize indexing and retrieving data from index in Solr? I've read a lot of documentation but could not find an answer.
2 Answers
Yes. You are able to define each field in the schema.xml file. Within that file, you can define what type of data is stored, how the document is tokenized, and how the tokenized data is manipulated. In order to meet your need, you will probably need to write a custom tokenizer and possibly custom filters as well.
Your best starting point is to look at field definition of text_general in schema. It has various tokenizers, filters that apply to the text and help you in indexing. You can define different tokens both at indexing and quering process.
You need to know that, tokens apply on the text, and filters apply on each token. You have descripton of graphs in some format. Can you elaborate more on th type of format, so that we can think of better ways? There are so many existing tokenzers and filters available. Depending upon the format, you can use existing ones or write your own.