0
votes

I have a lucene index with a lot of documents.

For now, I display a list of all the documents path with this code:

public List<Verbatim> GetAllPath(string indexFolder)
    {
        FSDirectory directory = FSDirectory.Open(indexFolder);
        List<string> pathlist = new List<Verbatim>();

        IndexReader reader = IndexReader.Open(directory, true);

        for (int i = 0; i < reader.NumDocs(); i++) 
        {
            if (reader.IsDeleted(i))
                continue;

            Document doc = reader.Document(i);

            pathlist.Add(doc.GetFields("path"));
        }

        reader.Dispose();
        return termlist;
    }

But now I have to list the terms of a document that list. This term are in the field "Text". I try to use this code to create this list, but it seems that it is not possible like that.

My Fields are defined like this:

        doc.Add(new Field("date", DateTime.Now.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
        doc.Add(new Field("path", path, Field.Store.YES, Field.Index.NOT_ANALYZED));
        doc.Add(new Field("title", System.Web.HttpUtility.HtmlDecode(title), Field.Store.YES, Field.Index.ANALYZED));
        doc.Add(new Field("text", ParseHtml(text, false), Field.Store.YES, Field.Index.ANALYZED));

How can I list all terms of one document?

1

1 Answers

0
votes

I add Field.TermVector.YES in my fields definitions like this :

doc.Add(new Field("text", ParseHtml(text, true), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));

With this new option I can use this code:

doc.LuceneTerms = new List<LuceneTerm>();
var termFreq = reader.GetTermFreqVector(docId, "text");

list<string> terms = new list<string>();

for (int i = 0; i < termFreq.GetTerms().Length; i++ )
{
    terms .Add(termFreq.GetTerms()[i]);
 }

and I obtain the list of terms of my document