1
votes

I have ELK(Logstash, Elasticsearch, Kibana) stack setup working.

Visualizations are created using CSV files which are loaded using logstash to kibana.

But, I have indexed the PDF, DOC files using elasticsearch and able to see the data on kibana and can search in them. But, I need to visualize the text in PDF and DOC formats on kibana like the most common words with their count.

Does anyone did that before or have an idea?

Thanks in advance!

1
Have you tried visualizations in Kibana? If you can search the documents in ES I think you should visualize it.MrSimple
Hi, I have to visualize like the count of a word that appears according to the word I search from a PDF document? Is that possible?monty
I understood your question for the first time too. Please answer mine. Have you tried the basic Kibana visualizations? What have you tried if not that? What is your progress? Did you get stuck somewhere?MrSimple
I have done basic visualizations, visualized the csv file data on kibana, but the problem is pdf files, which are indexed using elastic search, I don't how exactly to visualize them, suppose if I select wordcloud and ask to show the most common words in the pdf file, I am unable to do so !monty
If you use kibana and go to Visualization menu, use the proper index that contains the pdf data and use Unique count aggregation on the proper field you should get the word count. If you have troubles with this, specify the Elasticsearch index structure and i can give a more exact answer.MrSimple

1 Answers

0
votes

@MrSimple, we are not able to see the exact format over there.

{
  "test": {
    "aliases": {},
    "mappings": {
      "attachment": {
        "properties": {
          "analyzer": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "field_statistics": {
            "type": "boolean"
          },
          "fields": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "file": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          ...
}