1
votes

When I search for special characters such as "#" no results come up.

Note that I have escaped the query string.

However, when combined with a letter like "c#" Lucene finds the term.

Is there any way to search for single special characters?

Here's my snippet:

Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength(1);

Zend_Search_Lucene_Analysis_Analyzer::setDefault(
    new \Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_CaseInsensitive());            

$index = Zend_Search_Lucene::create('/tmp/index');       
$doc = new Zend_Search_Lucene_Document;
$doc->addField(Zend_Search_Lucene_Field::Text('title', 'Some Title Here', 'UTF-8'))
    ->addField(Zend_Search_Lucene_Field::Text('content-01', '+ @ #', 'UTF-8'))
    ->addField(Zend_Search_Lucene_Field::Text('content-02', 'C+ C#', 'UTF-8'));        

$index->addDocument($doc);
$index->commit();

/* returns 0 results */
$r = $index->find("/#");
echo count($r) . "\n";

/* returns 1 results */
$r = $index->find('C#');
echo count($r) . "\n";

/* returns 1 results */
$r = $index->find('C+');
echo count($r) . "\n";

1
$index->find('C'); returns any resultsNandakumar V
Quite right. I hope somebody can provide a solution or at least an explanation.EngineerCoders
@NandakumarV and Engineer - after an hour of working I've got some solution - check my answerKarol

1 Answers

4
votes

According to this page list of special characters is as follows:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

So you shouldn't have to escape #. But even if you don't use escaping 'slash' you will still get 0 results. You can't fix this behaviour even when you change Text field type to Keyword.

So I started investigating on it and run this piece of code:

echo('<pre>');
var_dump(Zend_Search_Lucene_Search_QueryParser::parse("#"));
echo('</pre>');
die();

It returned Zend_Search_Lucene_Search_Query_Boolean object with one subquery of Zend_Search_Lucene_Search_Query_Preprocessing_Term type. And what is funny, according to documentation:

It's an internal abstract class intended to finalize ase a query processing after query parsing.

This type of query is not actually involved into query execution.

So the only thought I had was: DO NOT USE DEFAULT PARSER ANYMORE!

So I thought that the solution for your problem is simple - create query manually using query construction API:

$term  = new Zend_Search_Lucene_Index_Term("#");
$query = new Zend_Search_Lucene_Search_Query_Term($term);

/* still returns 0 results!! */
$r = $index->find($query);
echo('<pre>');
var_dump(count($r));
echo('</pre>');

But it's NOT working again!

The only way I made it working (with query parser as well) was by adding this line:

->addField(Zend_Search_Lucene_Field::keyword('content-03', '#'))

So assuming: special characters can only be searched as keywords, as these fields are not tokenized. But keywords are treated as a whole phrase (even with phrases inside), and this is huge limitation.