Question:
How to combine an exact match in one field AND a fuzzy search in another in lucene 4.5?
Problem:
I have indexed the NGA Geonames gazetteer in a lucene index. I need to fuzzy query one field (the place name) but constrain the query to records that have a specific country code. Here is a sample query I am running
I am not using SOLR, and I have done a lot of research and trial and error, but I have no clear answers, could be that I'm just slow.
FULL_NAME_ND_RO:india AND CC1:in
I want a fuzzy search on india, but I want ONLY RECORDS THAT EXACTLY MATCH "in" (the country code)
Here is what I've tried:
1. Index every field as a textfield and boost the country code field using ^N. Still returns different country codes, and the one boosted does not always come first...
2. Index every field as text EXCEPT the country code, which I indexed as StringField. This way I get no results at all.
Here is the code that indexes the Gaz:
public void index(File outputIndexDir, File gazateerInputData, GazType type) throws Exception {
if (!outputIndexDir.isDirectory()) {
throw new IllegalArgumentException("outputIndexDir must be a directory.");
}
String indexloc = outputIndexDir + type.toString();
Directory index = new MMapDirectory(new File(indexloc));
Analyzer a = new StandardAnalyzer(Version.LUCENE_45);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_45, a);
IndexWriter w = new IndexWriter(index, config);
readFile(gazateerInputData, w, type);
w.commit();
w.close();
}
public void readFile(File gazateerInputData, IndexWriter w, GazType type) throws Exception {
BufferedReader reader = new BufferedReader(new FileReader(gazateerInputData));
List<String> fields = new ArrayList<String>();
int counter = 0;
// int langCodeIndex = 0;
System.out.println("reading gazateer data from file...........");
while (reader.read() != -1) {
String line = reader.readLine();
String[] values = line.split(type.getSeparator());
if (counter == 0) {
for (String columnName : values) {
fields.add(columnName.replace("»¿", "").trim());
}
} else {
Document doc = new Document();
for (int i = 0; i < fields.size() - 1; i++) {
if (fields.get(i).equals("CC1")) {
doc.add(new StringField(fields.get(i), values[i], Field.Store.YES));
} else {
doc.add(new TextField(fields.get(i), values[i], Field.Store.YES));
}
}
w.addDocument(doc);
}
counter++;
if (counter % 10000 == 0) {
w.commit();
System.out.println(counter + " .........committed to index..............");
}
}
w.commit();
System.out.println("Completed indexing gaz! index name is: " + type.toString());
}
here is the code for running the query
QueryParser parser = new QueryParser(Version.LUCENE_45, luceneQueryString, geonamesAnalyzer);
Query q = parser.parse(luceneQueryString);
TopDocs search = geonamesSearcher.search(q, rowsReturned);
geonamesAnalyzer is a StandardAnalyzer....luceneQueryString is like the query above.
Any advise would be great.