0
votes

I am trying to stream twitter to elasticsearch. I am having no problems if i do not create any index before streaming, but in such a way i can't filter by date and create timelines. I tried to use this mapping:

https://gist.github.com/christinabo/ca99793a5d160fe12fd9a31827e74444

that allegedly allows for "date" to be correctly picked by ES, but i receive this error when creating the index:

"type": "illegal_argument_exception", "reason": "unknown setting [index.twitter.mappings._doc.properties.coordinates.properties.coordinates.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"

what's wrong?

thanks

2
How are you sending the Request? cURL, postman, kibana, other?toomaas
How is your request? It seems to be malformed since it is confusing your mapping with index settings.leandrojmp

2 Answers

0
votes

I am running this python script. Before, I have tried to set the mapping from Dev Tools with PUT twitter_stream. Sorry for the terrible indentation!

    es = Elasticsearch("https://admin:admin@localhost:9200", 
    verify_certs=False)
    es.indices.create(index='twitter_stream', ignore=400)

    class StreamApi(tweepy.StreamListener):
    status_wrapper = TextWrapper(width=60, initial_indent='    ', 
   subsequent_indent='    ')

    def on_status(self, status):

  json_data = status._json

   es.index(index="twitter_stream",
          doc_type="twitter",
              body=json_data,
      ignore=400
          )

   streamer = tweepy.Stream(auth=auth, listener=StreamApi(), timeout=30)

    terms = ['#assange', 'assange']

    streamer.filter(None,terms) 
0
votes

After reading the date field comment, that date format of the tweets is not one of the default formats supported.

For ElasticSearch to understand that as a date field you should specify a custom mapping for the twitter_stream index, where you tell what date format you are expecting for the tweets date field. The syntax that explains the customizable date formats is here.

So, if you are using Elasticsearch 7.X, you can specify the custom format this way:

PUT twitter_stream
{
  "mappings": {
    "properties": {
      "YOUR_TWEETS_DATE_FIELD": {
        "type":   "date",
        "format": "EEE LLL dd HH:mm:ss Z yyyy"
      }
    }
  }
}

You can copy and execute the above configuration to the Kibana dev tools console. Then, try running the pyhton script again. To explain the letters used in the format:

E       day-of-week                 text              Tue; Tuesday; T
M/L     month-of-year               number/text       7; 07; Jul; July; J
d       day-of-month                number            10
H       hour-of-day (0-23)          number            0
m       minute-of-hour              number            30
s       second-of-minute            number            55
Z       zone-offset                 offset-Z          +0000; -0800; -08:00;
y       year-of-era                 year              2004; 04

Also, there is no need to define anything else in the mapping. ElasticSearch will define the rest of the fields and types with its dynamic mapping.