Code
import geojson
from datetime import datetime
from elasticsearch import Elasticsearch, helpers
def geojson_to_es(gj):
for feature in gj['features']:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
yield feature
with open("GeoObs.json") as f:
gj = geojson.load(f)
es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])
k = ({
"_index": "YOUR_INDEX",
"_source": feature,
} for feature in geojson_to_es(gj))
helpers.bulk(es, k)
Explanation
with open("GeoObs.json") as f:
gj = geojson.load(f)
es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])
This portion of the code loads an external geojson file, then connects to Elasticsearch.
k = ({
"_index": "conflict-data",
"_source": feature,
} for feature in geojson_to_es(gj))
helpers.bulk(es, k)
The ()
s here creates a generator which we will feed to helpers.bulk(es, k)
. Remember _source
is the original data as is in Elasticsearch speak - IE: our raw JSON. _index
is just the index in which we want to put our data. You'll see other examples with _doc
here. This is part of the mapping types and no longer exists in Elasticsearch 7.X+.
def geojson_to_es(gj):
for feature in gj['features']:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
yield feature
The function geojson
uses a generator to produce events. A generator function will, instead of returning and finishingresume at the keyword
yield` after each call. In this case, we are generating our GeoJSON features. In my code you also see:
date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
This is just an example of manipulating the data in the JSON before sending it out to Elasticsearch.
The key is in your mapping file you must have something tagged as geo_point
or geo_shape
. These data types are how Elasticsearch recognizes geo data. Example from my mapping file:
...
{
"properties": {
"geometry": {
"properties": {
"coordinates": {
"type": "geo_point"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
...
That is to say, before uploading your GeoJSON data with Python, you need to create your index, and then apply a mapping file which includes either geo_shape
or geo_point
using something like:
curl -X PUT "localhost:9200/YOUR_INDEX?pretty"
curl -X PUT localhost:9200/YOUR_INDEX/_mapping?pretty -H "Content-Type: application/json" -d @mapping.json