1
votes

I have an index that has a document with the following text:

John's car is at the shop.

Searching this text for John does not return the document (using the default analyzer). The indexer doesn't seem to consider the single quote as a stop character. The query does not have a single quote so I am not escaping anything - the indexed text has the single quote.

Note that when I search for John's (including the single quote) the correct results get returned. The single quote is ASCII character 39 not some fancy Unicode apostrophe character.

Is this a known issue and if so is there a workaround to this?

Thanks.

1

1 Answers

2
votes

The default analyzer only makes minimal language-specific assumptions (e.g. that words are separated by spaces and punctuation). You can use one of the English analyzers if you want the search engine to account for English language elements, including elimination of possessives as in your example. Note that English analyzers do other things as well such as stemming or lemmatization (depending on the analyzer you choose). If you only want possessive elimination but nothing else, you can create a custom analyzer and use the word delimiter token filter with the possessive elimination option enabled. More on how to use built-in analyzers such as the English one here, more on how to build custom analyzers and list of options for token filters here.