I'm using Lucene 6.3, but I am not able to figure out what is wrong with the following very basic search query. It simply adds to documents each with a single date range and then tries to search on a greater range the should find both documents. What is wrong?
There are inline comments which should make the exmaple pretty self explanatory. Let me know if anything is unclear.
Please note that my main requirement is being able to to perform date range query along side other field queries such as
text:interesting date:[2014 TO NOW]
This is after watching the Lucene spatial deep dive video introduction which introduces the framework on which DateRangePrefixTree and strategies are based.
Rant: It feels like if I am making any mistakes here that I should get some validation errors, either on the query or on the writing, given how simplistic my example is.
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.*;
import org.apache.lucene.spatial.prefix.NumberRangePrefixTreeStrategy;
import org.apache.lucene.spatial.prefix.PrefixTreeStrategy;
import org.apache.lucene.spatial.prefix.tree.DateRangePrefixTree;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
import java.util.Calendar;
import java.util.Date;
public class TestLuceneDatePrefix {
/*
All these names should be lower case as field names are case sensitive in Lucene.
*/
private static final String NAME = "name";
public static final String TIME = "time";
private Directory directory;
private StandardAnalyzer analyzer;
private ScoreDoc lastDocOnPage;
private IndexWriterConfig indexWriterConfig;
@Before
public void setup() {
analyzer = new StandardAnalyzer();
directory = new RAMDirectory();
indexWriterConfig = new IndexWriterConfig(analyzer);
}
@Test
public void testAddDocumentAndSearchByDate() throws IOException {
IndexWriter w = new IndexWriter(directory, new IndexWriterConfig(analyzer));
// Responsible for creating the prefix string / geohash / token to identify the date.
// aka Create post codes
DateRangePrefixTree prefixTree = new DateRangePrefixTree(DateRangePrefixTree.JAVA_UTIL_TIME_COMPAT_CAL);
// Strategy indexing the token.
// aka transform post codes into tokens that make them efficient to search.
PrefixTreeStrategy strategy = new NumberRangePrefixTreeStrategy(prefixTree, TIME);
createDocument(w, "Bill", new Date(2017,1,1), prefixTree, strategy);
createDocument(w, "Ted", new Date(2018,1,1), prefixTree, strategy);
w.close();
// Written the document, now try query them
DirectoryReader reader;
try {
QueryParser queryParser = new QueryParser(NAME, analyzer);
System.out.println(queryParser.getLocale());
// Surely searching only on year for the easiest case should work?
Query q = queryParser.parse("time:[1972 TO 4018]");
// The following query returns 1 result, so Lucene is set up.
// Query q = queryParser.parse("name:Ted");
reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
TotalHitCountCollector totalHitCountCollector = new TotalHitCountCollector();
int hitsPerPage = 10;
searcher.search(q, hitsPerPage);
TopDocs docs = searcher.search(q, hitsPerPage);
ScoreDoc[] hits = docs.scoreDocs;
// Hit count is zero and no document printed!!
// Putting a dependency on mockito would make this code harder to paste and run.
System.out.println("Hit count : "+hits.length);
for (int i = 0; i < hits.length; ++i) {
System.out.println(searcher.doc(hits[i].doc));
}
reader.close();
}
catch (ParseException e) {
e.printStackTrace();
}
}
private void createDocument(IndexWriter w, String name, Date fromDate, DateRangePrefixTree prefixTree, PrefixTreeStrategy strategy) throws IOException {
Document doc = new Document();
// Store a text/stored field for the name. This helps indicate that Lucene is orking.
doc.add(new TextField(NAME, name, Field.Store.YES));
//offset toDate
Calendar cal = Calendar.getInstance();
cal.setTime( fromDate );
cal.add( Calendar.DATE, 1 );
Date toDate = cal.getTime();
// This lets the prefix tree create whatever tokens it needs
// perhaps index year, date, second etc separately, hence multiple potential tokens.
for (IndexableField field : strategy.createIndexableFields(prefixTree.toRangeShape(
prefixTree.toUnitShape(fromDate), prefixTree.toUnitShape(toDate)))) {
// Debugging the tokens produced is difficult as I can't intuitively look at them and know if they are valid.
doc.add(field);
}
w.addDocument(doc);
}
}
Update:
I thought maybe the answer was to use SimpleAnalyzer compared to StandardAnalyzer, but this doesn't appear to work either.
My requirement of being able to parse user date range's does seem to be catered by SOLR, so I would expect this to be based on Lucene functionality.