0
votes

I have an index and I am using Luke to test some queries. There is one case that is confusing me.

In the index I have the following Names:

go!

GO! Kruger

GO! Namibia

Compleat Golfer

When I use SimpleAnalyzer with the following query Name:go I get the expected results, "go!" is at the top of the list. However, when I use a wildcard in my query, Name:go* I get results except "go!" (or any derivative thereof). Golfer is only returned when the wildcard query is run.

It is my understanding that the asterisk wildcard (*) functions as a zero or more matches, or is my understanding incorrect?

It appears that the exclamation mark does not count as a character in the index. I know that it is a reserved query character.

Have I missed something, or is this the expected behaviour? Does anyone have a possible workaround or solution?

My current idea is to "hack" it so that if the search term is "go" then it will not append the asterisk to the query. However, I don't want to do that as I am sure there must be a better solution.

UPDATE

It turns out that go! and the others are in the results of the wildcard query, its just that they are almost at the end of 2000 results.

Does anyone know how to make all indexed names lowercase after being indexed? Or would I have to re-index everything? Would it be possible to change the score to ignore capitals?

1
If you are simply using SimpleAnalyzer both when querying and when indexing, then your wildcard query should work as expected, and each of the given strings should be matched. However, in that case name:go should not match "Compleat Golfer", since the token go would not be found in there. It looks like there is something more going on either with the analysis being used, or perhaps some added query logic, that isn't clear here. - femtoRgon
@femtoRgon you're right, I edited my question. Golfer is only returned with the wildcard query. So it seems my understanding of the basics here is correct. - Aidan Host

1 Answers

0
votes

In the end I had to "hack" a solution together. I check if the search query equals "go" and if it does I do not append the wild card character.

I think this has something to do with search term lengths, think stop words: a, an, the, or, etc. Stop words are excluded because they add no meaning to the search. Also the longer a search term is the better the results will be.

I am going to mark this as the answer. Hopefully, somebody will either find this useful or will find a proper answer.