1
votes

I tried this Lucene code example, which worked:
http://snippets.dzone.com/posts/show/8965

However changing:
Query query = parser.parse("st.");
to
Query query = parser.parse("t");

returned zero hits.

How to write a Lucene query that returns all words containing the letter "t" ?
(max nbr of hits to return = 20)

Edit: here's what worked:

RegexQuery regexquery = new RegexQuery(new Term("fieldname", ".t."));
isearcher.search(regexquery, collector);
System.out.println("collector.getTotalHits()=" + collector.getTotalHits());

2

2 Answers

6
votes

You need a different Analyzer. The example uses StandardAnalyzer, which removes punctuation and breaks words according to white space and some other more elaborate rules. It does not, however, break words into characters. You will probably need to build your own custom analyzer to do this, and it seems it will be costly in both run time and memory consumption. Another (probably better) option is to use a RegexQuery.

1
votes

I have good news and bad news. The good news is that you can use wildcards to match any text:

parser.parse("st*"); // Will math "st.", "station", "steal", etc...

Unfortunately, the documentation indicates:

Note: You cannot use a * or ? symbol as the first character of a search.

Meaning, you cannot use this syntax:

parser.parse("*t*");

Therefore, you cannot ask Lucene to return terms that contain the letter 't' at an arbitrary location. You can ask Lucene to return terms that begin with a certain letter.

You're only option at this point appears to be iterating through all terms, doing you're own matching.