Error while writing date from spark to elasticsearch due to timestamp length

Question

I get an error when writing data to elasticsearch from spark. Most documents are written fine, then I have this kind of exceptions

org.elasticsearch.hadoop.rest.EsHadoopRemoteException: date_time_exception: date_time_exception: Invalid value for Year (valid values -999999999 - 999999999): -6220800000

The field mapping in elasticsearch is "date"
The field type in pySpark is DateType not TimestampType which imo should make clear that this is a date without time. The value shown by spark is "1969-10-21" so a perfectly reasonable date.

(It was originally a timestampType, from another elasticsearch date read but I converted it to a dateType in hope to solve this error but I have the exact same error message (with the exact same timestamp value) either sending to elasticSearch a TimestampType or DateType)

My guess is that there are three 0s that shouldn't be in that timestamp sent to elasticsearch but I can't find any way to normalize it. Is there an option for org.elasticsearch.hadoop connector ?

(elk version is 7.5.2, spark is 2.4.4)

Vincent Chalmel Vincent Chalmel · Accepted Answer · 2020-01-30T09:25:22

Obvious workaround : use any other type than an TimestampType or DateType

e.g. using this udf for LongType (to demonstrate that it's indeed a timestamp length issue).

import datetime
import time

def conv_ts(d) :
  return time.mktime(d.timetuple())
ts_udf = F.udf(lambda z : int(conv_ts(z)), LongType())

(Note that in that snippet the spark input is timestampType not dateType, so a python datetime not date, because I tried messing around with time conversions too)

OR (much more efficient way) obviously to avoid the udf by using a StringType field of formatted date instead of the Long timestamp thanks to the pyspark.sql.date_format function.

A solution but not a really satisfying one, I would rather understand why the connector doesn't properly deal with timestampTypes and dateTypes by adjusting the timestamp length accordingly.

Error while writing date from spark to elasticsearch due to timestamp length

1 Answers