I'm trying to make a searchable phone/local business directory using Apache Lucene.
I have fields for street name, business name, phone number etc. The problem that I'm having is that when I try to search by street where the street name has multiple words (e.g. 'the crescent'), no results are returned. But if I try to search with just one word, e.g 'crescent', I get all the results that I want.
I'm indexing the data with the following:
String LocationOfDirectory = "C:\\dir\\index";
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34);
Directory Index = new SimpleFSDirectory(LocationOfDirectory);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE.34, analyzer);
IndexWriter w = new IndexWriter(index, config);
Document doc = new Document();
doc.add(new Field("Street", "the crescent", Field.Store.YES, Field.Index.Analyzed);
w.add(doc);
w.close();
My searches work like this:
int numberOfHits = 200;
String LocationOfDirectory = "C:\\dir\\index";
TopScoreDocCollector collector = TopScoreDocCollector.create(numberOfHits, true);
Directory directory = new SimpleFSDirectory(new File(LocationOfDirectory));
IndexSearcher searcher = new IndexSearcher(IndexReader.open(directory);
WildcardQuery q = new WildcardQuery(new Term("Street", "the crescent");
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
I have tried swapping the wildcard query for a phrase query, first with the entire string and then splitting the string up on white space and wrapping them in a BooleanQuery like this:
String term = "the crescent";
BooleanQuery b = new BooleanQuery();
PhraseQuery p = new PhraseQuery();
String[] tokens = term.split(" ");
for(int i = 0 ; i < tokens.length ; ++i)
{
p.add(new Term("Street", tokens[i]));
}
b.add(p, BooleanClause.Occur.MUST);
However, this didn't work. I tried using a KeywordAnalyzer instead of a StandardAnalyzer, but then all other types of search stopped working as well. I have tried replacing spaces with other characters (+ and @), and converting queries to and from this form, but that still doesn't work. I think it doesn't work because + and @ are special characters which are not indexed, but I can't seem to find a list anywhere of which characters are like that.
I'm beginning to go slightly mad, does anyone know what I'm doing wrong?