1
votes

i am currently trying to get all Documents from a Lucene Index (v. 4) in a RamDirectory.

on index creation the following addDocument function is used:

public void addDocument(int id, String[] values, String[] fields) throws IOException{
    Document doc = new Document();

    doc.add(new IntField("getAll", 1, IntField.TYPE_STORED));    
    doc.add(new IntField("ID", id, IntField.TYPE_STORED));                                              
    for(int i = 0; i < fields.length; i++){
        doc.add(new TextField(fields[i], values[i], Field.Store.NO));
    }

    writer.addDocument(doc);
}

after calling this for all documents the writer is closed. as you can see from the first field added to the document, i added an additional field "getAll" to make it easy to retrieve all documents. If I understood it right, the Query "getAll:1" should return all documents in the index. But thats not the case. I am using the following function for that:

public List<Integer> getDocIds(int noOfDocs) throws IOException, ParseException{
    List<Integer>   result    = new ArrayList<Integer>(noOfDocs);
    Query           query     = parser.parse("getAll:1");
    ScoreDoc[]      docs      = searcher.search(query, noOfDocs).scoreDocs;

    for(ScoreDoc doc : docs){
        result.add(doc.doc);
    }

    return result;
}

noOfDocs is the number of Documents that were indexed. Of course i used the same RamDirectory when creating the IndexSearcher. Substitution of the parsed Query to a manually created TermQuery didn't help either. The query returns no results.

Hope someone can help to find my error. Thanks

2

2 Answers

1
votes

I believe you are having trouble searching because you are using an IntField, rather than a StringField or TextField, for instance. IntField, and other numeric fields, are designed for numeric range querying, and are not indexed in their raw form. You may use a NumericRangeQuery to search for them.

Really, though, IntField should only be used, to my mind, for numeric values, and not for a string of digits, which is what you appear to have. IDs should be keyword or text fields, generally.

As far as pulling all records, you don't need to add a field to do that. Simply use a MatchAllDocsQuery.

0
votes

I think first you should run Luke to verify the contents of the index.

Also, if you allow * as the first character of a query with queryParser.setAllowLeadingWildcard(true); , then a query like ID:* would retrieve all documents without having to include the getAll field.