Extract query terms from text for querying Solr server

Question

I am using Solrj to build queries for Solr server.

So I have some pretty short free-form texts that can contain various special characters - like Mr. John's New-Wall, "Hotels & Food".

A phrase query for text like this would not produce enough matches. So from this text I would like to extract terms for building a simple query, something like content:Mr OR content:John's OR content:Hotels OR content:Food. (It probably would be good to somehow consider the term proximity, but I have to start with something).

The field that I am searching is the default text_general field. I started with replacing some special characters with spaces and splitting them up to extract the terms. But it feels kind of redundant.

Isn't there an easier way to extract terms from text using Solrj and Solr? Basically I would like to extract terms from text similarly to how it is done by Solr when it creates its index.

nick_v1 nick_v1 · Accepted Answer · 2015-06-08T15:00:32

I am not sure exactly what your question is, however here is a bit of info that you may find helpful:

Basically I would like to extract terms from text similarly to how it is done by Solr when it creates its index.

You can configure indexing and query field processing in your schema. I would suggest you take a look in here. This gives you a bit of flexibility to normalize your data.

So from this text I would like to extract terms for building a simple query, something like content:Mr OR content:John's OR content:Hotels OR content:Food.

This is the default way that solr queries under the hood. I would suggest you look up edismax query parser and qf and tie parameters.

Hope it helps

Extract query terms from text for querying Solr server

1 Answers