2
votes

I'm using Lucene to index documents consisting of fragments. The document as a whole consists of fields describing it (ie. author, title, publish date). Fragments contain text and tags (keywords). I would like to be able to:

  1. search for all fragments by author, which have tag Foo.
  2. search for all documents by title.
  3. search for all documents, which contain some words (in any fragment)

I read about BlockJoinQuery in Lucene, but I am not sure if it's suitable for my problem: for instance, having a following document:

document: title="Hello World" author="Sam Brown"
fragment 1: tags="sunny" text="...."
fragment 2: tags="cloudy" text="moody and sleepy"

would I be able to find this document with a query: tags:sunny and text:sleepy? Such query will not match any child document(fragment), but perhaps it would match the parent - the lucene documentation does not state that though.

1

1 Answers

1
votes

Case 1 should work well with BlockJoinQuery.

Case 2 works well, without BlockJoinQuery.

Case 3 can be made to work, though it's a little tricky because you'd have to AND at the parent document level. Ie, make a BooleanQuery with two MUST clauses. First clause is BlockJoinQuery(TermQuery(Term("tags", "sunny"))) and second clause is BlockJoinQuery(TermQuery(Term("text", "sleepy"))). That ought to work I think? You just cannot do the ANDing at the sub-document (fragment) level since no single fragment has both terms.