0
votes

I am indexing documents that may contain any special/reserved characters in their fulltext body. For example "PDF/A is an ISO-standardized version of the Portable Document Format..."

I would like to be able to search for pdf/a without having to escape the forward slash.

How should i analyze my query-string and what type of query should i use?

2
Can you share what you have tried? What do your mappings and query look like as a starting point?eemp

2 Answers

0
votes

The default standard analyzer will tokenize a string like that so that "PDF" and "A" are separate tokens. The "A" token might get cut out by the stop token filter (See Standard Analyzer). So without any custom analyzers, you will typically get any documents with just "PDF".

You can try creating your own analyzer modeled off the standard analyzer that includes a Mapping Char Filter. The idea would that "PDF/A" might get transformed into something like "pdf_a" at index and query time. A simple match query will work just fine. But this is a very simplistic approach and you might want to consider how '/' characters are used in your content and use slightly more complex regex filters which are also not perfect solutions.

Sorry, I completely missed your point about having to escape the character. Can you elaborate on your use case if this turns out to not be helpful at all?

0
votes

To support queries containing reserved characters i now use the Simple Query String Query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html)

As of not using a query parser it is a bit limited (e.g. no field-queries like id:5), but it solves the purpose.