0
votes

I have a complex identifier field that contains letters, numbers, white space, and special characters. I have been using the Keyword analyzer on this field, but having problems filtering results. Here is an example piece of data the field would contain:

O-2011-006953 /4

With the Keyword analyzer in place I'm able to do a contains filter on the index field using numbers, but not letters. The following filter works:

search.ismatch('/.*2011.*/', 'complex_identifier_field', 'full', 'all')

But if I try to do a contains search with a letter, I get 0 results:

search.ismatch('/.*O.*/', 'complex_identifier_field', 'full', 'all')

I believe my problem is I need another or custom analyzer, and I have recently tried to use the NGram analyzers, and tried to write a custom analyzer using the keyword tokenizer, but still unable to do a contains search on the field. How can I create a field that is one token; accepts alphanumeric characters, white space, and special characters; and allows me to do a contains filter to find any part of the identifier field?

UPDATE

Here is the definition of the field:

new Field("accession_number", DataType.String){ IsSearchable= true, IsFilterable = true, Analyzer = AnalyzerName.Keyword },

And here is the exact search I'm using:

var result = indexClient.Documents.Search(query, searchParameters: parameters);

where query = "print" and parameters =

{
Facets = null,
Filter = search.ismatch('/.*O.*/', 'accession_number', 'full', 'all'),
HighlightFields = null,
HighlightPostTag = null,
HighlightPreTag = null,
IncludeTotalResultsCount = true,
MinimumCoverage = null,
OrderBy = null,
QueryType = Full,
ScoringParameters = null,
ScoringProfile = null,
SearchFields = null,
SearchMode = All,
Select = (9 fields),
Skip = 0,
Top = 50
}
1
Thanks for updating with extra details. I don't see anything wrong. To look further into this, it would help to see: 1) the JSON for the index definition (you can get this from the Azure Portal, go to the index and there's a tab for JSON), 2) the JSON for the document that should match but doesn't (you can use the query explorer in the Portal for this), and 3) the query that's failing, done directly in the Portal query explorer instead of through the API. Trying to remove layers while troubleshooting.Pablo Castro

1 Answers

0
votes

In your example, the value O-2011-006953 /4 doesn't match the regex /.O./, because the regex requires a character before the "O" ("." means "exactly 1 character in that position"). If you want to match a substring anywhere within a token, you can use /.*O.*/ where "O" is the substring, "." means "any character", and "*" means "zero or more of the previous element, in this case the ".".

Note that this type of regex search can be slow and doesn't guarantee full recall (i.e. we may not return all documents that might match the regex, we limit how many terms we expand from the regex).