4
votes

I am trying to store spatial data in the form of geojson,csv files and shape files into elasticsearch USING PYTHON.I am new to elasticsearch and even after following the documentation i am not able to successfully index it. Any help would be appreciated.

sample geojson file :

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {
        "ID_0": 105,
        "ISO": "IND",
        "NAME_0": "India",
        "ID_1": 1288,
        "NAME_1": "Telangana",
        "ID_2": 15715,
        "NAME_2": "Telangana",
        "VARNAME_2": null,
        "NL_NAME_2": null,
        "HASC_2": "IN.TS.AD",
        "CC_2": null,
        "TYPE_2": "State",
        "ENGTYPE_2": "State",
        "VALIDFR_2": "Unknown",
        "VALIDTO_2": "Present",
        "REMARKS_2": null,
        "Shape_Leng": 8.103535,
        "Shape_Area": 127258717496
      },
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              79.14429367552918,
              19.500257885106404
            ],
            [
              79.14582245808431,
              19.498859172536427
            ],
            [
              79.14600496956801,
              19.498823981691853
            ],
            [
              79.14966523737327,
              19.495821705263914
            ]
          ]
        ]
      }
    }
  ]
}
2
Can you show your geojson file? (or part of it)Val
{"type":"FeatureCollection", "features": [{"type":"Feature", "properties":{"ID_0":105,"ISO":"IND","NAME_0":"India","ID_1":1288,"NAME_1":"Telangana","ID_2":15715,"NAME_2":"Telangana","VARNAME_2":null,"NL_NAME_2":null,"HASC_2":"IN.TS.AD","CC_2":null,"TYPE_2":"State","ENGTYPE_2":"State","VALIDFR_2":"Unknown","VALIDTO_2":"Present","REMARKS_2":null,"Shape_Leng":8.103535,"Shape_Area":127258717496},"geometry":{"type":"Polygon","coordinates":[[[79.14429367552918,19.500257885106404],[79.14582245808431,19.498859172536427],[79.14600496956801,19.498823981691853],intern
Please update your question with itVal
It's funny, I just read the documentation on elasticsearch.co and indeed the chapter "How to index geosjon file..." does only show a geojson document, but not how to index it.Daniel W.
@intern did you ever figure this out?KJP

2 Answers

2
votes

Code

import geojson
from datetime import datetime
from elasticsearch import Elasticsearch, helpers


def geojson_to_es(gj):

    for feature in gj['features']:

        date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
        feature["properties"]["timestamp"] = int(date.timestamp())
        feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
        yield feature


with open("GeoObs.json") as f:
    gj = geojson.load(f)

    es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])

    k = ({
        "_index": "YOUR_INDEX",
        "_source": feature,
    } for feature in geojson_to_es(gj))

    helpers.bulk(es, k)

Explanation

with open("GeoObs.json") as f:
    gj = geojson.load(f)

    es = Elasticsearch(hosts=[{'host': 'localhost', 'port': 9200}])

This portion of the code loads an external geojson file, then connects to Elasticsearch.

    k = ({
        "_index": "conflict-data",
        "_source": feature,
    } for feature in geojson_to_es(gj))

    helpers.bulk(es, k)

The ()s here creates a generator which we will feed to helpers.bulk(es, k). Remember _source is the original data as is in Elasticsearch speak - IE: our raw JSON. _index is just the index in which we want to put our data. You'll see other examples with _doc here. This is part of the mapping types and no longer exists in Elasticsearch 7.X+.

def geojson_to_es(gj):

    for feature in gj['features']:

        date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
        feature["properties"]["timestamp"] = int(date.timestamp())
        feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')
        yield feature

The function geojson uses a generator to produce events. A generator function will, instead of returning and finishingresume at the keywordyield` after each call. In this case, we are generating our GeoJSON features. In my code you also see:

date = datetime.strptime("-".join(feature["properties"]["event_date"].split('-')[0:2]) + "-" + feature["properties"]["year"], "%d-%b-%Y")
feature["properties"]["timestamp"] = int(date.timestamp())
feature["properties"]["event_date"] = date.strftime('%Y-%m-%d')

This is just an example of manipulating the data in the JSON before sending it out to Elasticsearch.

The key is in your mapping file you must have something tagged as geo_point or geo_shape. These data types are how Elasticsearch recognizes geo data. Example from my mapping file:

...
{
  "properties": {
    "geometry": {
      "properties": {
        "coordinates": {
          "type": "geo_point"
        },
        "type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    },
...

That is to say, before uploading your GeoJSON data with Python, you need to create your index, and then apply a mapping file which includes either geo_shape or geo_point using something like:

curl -X PUT "localhost:9200/YOUR_INDEX?pretty" curl -X PUT localhost:9200/YOUR_INDEX/_mapping?pretty -H "Content-Type: application/json" -d @mapping.json

0
votes

You must separate the GeoJson features into (1) geometry and (2) properties/attributes parts. You cannot index GeoJson features and feature collections directly (see documentation), only the geometry part is supported as a field type.

So you final indexable document would look somewhat flattened:

{
    "ID_0": 105,
    "ISO": "IND",
    "NAME_0": "India",
    "ID_1": 1288,
    "NAME_1": "Telangana",
    "ID_2": 15715,
    "NAME_2": "Telangana",
    "VARNAME_2": null,
    "NL_NAME_2": null,
    "HASC_2": "IN.TS.AD",
    "CC_2": null,
    "TYPE_2": "State",
    "ENGTYPE_2": "State",
    "VALIDFR_2": "Unknown",
    "VALIDTO_2": "Present",
    "REMARKS_2": null,
    "Shape_Leng": 8.103535,
    "Shape_Area": 127258717496,
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    79.14429367552918,
                    19.500257885106404
                ],
                [
                    79.14582245808431,
                    19.498859172536427
                ],
                [
                    79.14600496956801,
                    19.498823981691853
                ],
                [
                    79.14966523737327,
                    19.495821705263914
                ]
            ]
        ]
    }
}