0
votes

I'm building a website for learning pruposes and i'm looking at lucene.net as a full text indexer for my content but I have some questions.

Lets say I have a hierarchy (n levels) of categories, and articles that are assigned to one category (1 cat -> n articles). Using a simple RDB would be very easy to search for an article under a category or any of it's subcategories. But i'm struggling to imagine how i'd build this kind of query using lucene. Options I think that might work:

Suposing that i'm idexing "title, text, category" for every article, one option would be to first get a list with the id's of every subcategory from the DB and then search in lucene with that list.

Other option would be to index the entire category "path" of the article inside a field in lucene. Something like "title", "text", "catparent1, catparent2, catparent3, category" ?

What's the best aproach when doing this kind of query with complex relational filters? (not just text search)

1

1 Answers

4
votes

Add the category path as an indexed field, and use a phrase search to search it:

ID        Title              Categories

"MyDoc1", "Hello world!",    "/programming/beginner/samples"
"MyDoc2", "Prove that P=NP", "/programming/advanced/samples"

Now you can query the categories either hierarchically using a phrase search:

"/programming/beginner"

or not-hierarchically using a word search:

"samples"

I use this method for indexing files with their pathnames - you can query for "dirname" or "parent/child" or "/root/parent/child" and it all works nicely.

You can control whether your search starts at the root by including or excluding the leading slash.

In terms of "complex relational filters", you can then combine these category searches with other searches and filters using boolean queries.