2
votes

I have a Lucene index and the document text is 'indexed' but not 'stored'.

I am using Luke v7.6.0 and it's great for 'visualising' the index.

Obviously because my document text is indexed but not stored I cannot copy or query the 'stored' value (there isn't one), but can I somehow extract the indexed text values to the clipboard or text file to allow me to analyse exactly what is indexed from my file?

1

1 Answers

2
votes

One of the available thing to you - is to check Lucene index files manually.

I suspect that the most important ones are the Term Dictionary files (*.tim)

I’ve indexed document with no stored values and terms - [email protected] in field email (TextField with Standard analyzer) and John in field name (StringField)

After this one, I opened tim file with hex editor and was able to see something like this:

TIM file

You could clearly see the values of test, test, com which were tokenized by Standard one, also you could see John still stays the same, since I used StringField. In my other examples, I was able to see the work of lowercasing as well.

Just a reminder, if you would like to repeat it - by default for small indices Lucene will put everything into compound file, which I don’t prefer for this specific debug. You could disable this by setUseCompoundFile(false)