0
votes

I have a json file where all the dates in the json are in /Date(1602949450000)/

the json is also nested. is there a generic way to parse all of /Date()/ into the timestamp?

I tried regexp_replace but I cannot convert the capture group into a timestamp.

regexp_replace("value", "\\/Date\\((\\-?\\d*?)([\\+\\-]\\d*)?\\)\\/","$1")
1
I can help you, but i'll need a bit more info. is it in utc?user13372194
Why do you have so many backslashes?OneCricketeer

1 Answers

0
votes

Your regex is correct, but the issue seems to be "$1". When I ran the code with a 1 instead worked for me:

dates_df = dates_df.withColumn(
    'date_extracted',
    F.regexp_extract("date", "\\/Date\\((\\-?\\d*?)([\\+\\-]\\d*)?\\)\\/", 1)
)

A more simple regex which does the same thing would be:

dates_df = dates_df.withColumn(
    'date_extracted',
    F.regexp_extract("date", "^.+(\d{13}).+$", 1)
)