I am comparing Lucene/Solr, Whoosh, Sphinx and Xapian for searching documents in DOC, DOCX, HTML and PDF. Only Solr is documented to have a document parser (Tika) which directly indexes documents. So it seems a clear winner.
But to level the playing field, I like to consider the alternatives. Do the others have direct document indexing (which I may have missed)? If not are they can it be implemented easily? Or is Solr the overwhelming choice?