Lucene Jackrabbit

Question

Recently we have added Lucene(2.4.1) support to our application which worked with Jackrabbit(1.6.2). We have done all like it was described in jackrabbit tutorial. And all works almost fine. But I noticed some strange behavior and can't find any docs about it. I decided to ask you about it.

For example: I have following text in Node(jcr:content) in jcr:data property

The quick brown fox jumps over the lazy dog 
!@#$%^& 
travmik! 
tra!vmik

My XPath query is the following:

String query = "root/element(*,my:documentBody)
                        [jcr:contains(*/*/element(*),'*" + param +"*')]";

Then I try to search:

"q", "qu", "qui", "quic", "quick", "k", "ck", "ick", "uick", "quick brown fox", "quick fox", "tra", "travmik", "mik" - all found ok

"tra!vmik", "travmik!", "!@#$" - nothing

And, yes I escaped all special characters from this.

What did I do wrong?

P.s. I have one more question - in Lucene docs says that "You cannot use a * or ? symbol as the first character of a search", but I use and it works. Why?

travmik travmik · Accepted Answer · 2010-12-12T17:45:01

I found the problem. It was some misunderstanding with Extractors which are used in jackrabbit for indexing content. I don't want to go into details, but can say that this piece of code from one of Extractors is the cause of all my problems:

if (!Character.isLetterOrDigit(c)) {
    if (!space) {
        space = true;
        buffer.append(' ');
        continue;
    }
    continue;
}

If someone is interested in this - I can explain in greater detail.

Lucene Jackrabbit

1 Answers