4
votes

I have many documents (with an analyzed text field title). They have been indexed in Elasticsearch and now I need only to get the term frequency TF and inverse document frequency IDF for each term within the field title without having any query. (just indexing the documents and retrieving the inverted index of all terms in the field title)

Is that possible in Elasticsearch?

4

4 Answers

3
votes

I wrote a tutorial on how to get a term-document matrix from ES. This does cover getting TFs but not IDFs. This was for ES 1.6.0 using Python.

For more you should have a look at the TermVector API.

1
votes
GET /YOUR_INDEX/YOUR_DOC_TYPE/YOUR_ID/_termvectors
{
  "fields" : ["YOUR_FIELD"],
  "term_statistics" : true,
  "field_statistics" : true
}

This will get the TF for every words in your document.

1
votes

In case someone still has a similar problem to OP's, I've created a Python module called inelastic that prints out an approximation of an Elasticsearch inverted index for a given index and field.

0
votes

No. You could maybe find a way to hack it together somehow. And on a per query basis you can use the EXPLAIN api eg https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-explain.html But there is no API to return this info.