Using Apache Beam(Python 2.7 SDK) I am trying to write JSON files as entities into Google Cloud Datastore.
Sample JSON:
{
"CustId": "005056B81111",
"Name": "John Smith",
"Phone": "827188111",
"Email": "[email protected]",
"addresses": [
{"type": "Billing", "streetAddress": "Street 7", "city": "Malmo", "postalCode": "CR0 4UZ"},
{"type": "Shipping", "streetAddress": "Street 6", "city": "Stockholm", "postalCode": "YYT IKO"}
]
}
I have written a Apache Beam pipeline with mainly 3 steps,
beam.io.ReadFromText(input_file_path)
beam.ParDo(CreateEntities())
WriteToDatastore(PROJECT)
In step 2, I am converting JSON object(dict) into an entity,
class CreateEntities(beam.DoFn):
def process(self, element):
element = element.encode('ascii','ignore')
element = json.loads(element)
Id = element.pop('CustId')
entity = entity_pb2.Entity()
datastore_helper.add_key_path(entity.key, 'CustomerDF', Id)
datastore_helper.add_properties(entity, element)
return [entity]
This works fine for basic properties. However since address is a dict object itself it fails. I have read a similar post.
However did not get the exact code to convert dict -> entity
Tried below to set address element as entity but does not work,
element['addresses'] = entity_pb2.Entity()
Other References: