0
votes

I have an JSON input coming into AWS Glue Job (python shell) from an Lambda function which I need to convert into pandas dataframe which will be consumed by the rest of the process.

Input JSON:

[{"ABC": "123", "CDE": "AB", "locations": [{"state": "FL", "city": "Orlando", "zip": "10001"}]}, 
 {"ABC": "456", "CDE": "CD", "locations": [{"state": "AL", "city": "AnotherCity"}]}]

Output Dataframe:

ABC CDE locations_state locations_city locations_Zip
123 AB  FL              Orlando        10001
456 CD  AL              AnotherCity    Null

I tried using the below json_normalize but in Glue job json normalize is not recognized. Any other inputs would be helpful.

df = pd.json_normalize(jsoninputstring, "locations", ['ABC','CDE'])

Thanks.

1
what is the expected output in your case?Prabhakar Reddy
Hi - Like i mentioned in the initial post, I would like to convert it into a structured dataframe.pythoner
Any input or direction would be helpful.pythoner

1 Answers

0
votes

I hit this same issue. When I checked the pandas version in the AWS Glue Shell job I was using it was 0.24.2, when posting this the most current version is 1.1.4.

How I checked in AWS Glue Job (Python Shell)

print(pd.__version__)

To get this same functionality in this older version of pandas: https://pandas.pydata.org/pandas-docs/version/0.22.0/generated/pandas.io.json.json_normalize.html

import pandas as pd
df = pd.io.json.json_normalize(r.json())