I am trying to use Mahout to do some analysis on the term vectors stored in my Solr/Lucene index. Unfortunately, it seems that the latest Mahout release is behind the latest Solr/Lucene release.
My Solr/Lucene installation is 4.10.3. As far as I can tell, the latest Mahout release (1.0) expects Lucene indexes at version 4.6.1.
When I run mahout lucene.vector
I get the error:
Exception in thread "main" org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource: MMapIndexInput(path="/path/to/data/index/segments.gen")): -3 (needs to be between -2 and -2)
I have tried two things so far to tackle this problem:
First, I edited my solrconfig.xml file to say:
<luceneMatchVersion>4.6.1</luceneMatchVersion>
delete my indexed data, and built a clean index from the original documents. This has done nothing to fix the error.
So secondly, I tried to change the lucene.version
in the Mahout pom.xml file to 4.10.3 and recompile the binary to see if the capabilities had been added yet. I knew this was unlikely to work, but tried anyway.
My question is, how do I appropriately change the Lucene version that Solr uses for writing index files if it is not the above luceneMatchVersion setting in solrconfig.xml?