Here is how my input file looks like:
{"Id": 1, "Address": {"Street":"MG Road","City":"Pune"}}
{"Id": 2, "Address": {"City":"Mumbai"}}
{"Id": 3, "Address": {"Street":"XYZ Road"}}
{"Id": 4}
{"Id": 5, "PhoneNumber": 12345678, "Address": {"Street":"ABCD Road", "City":"Bangalore"}}
In my dataflow pipeline, How I can I dynamically determine which fields are present in each row in order to adhere to the BigQuery table schema.
e.g., In row #2, Street
is missing. I want the entry for column Address.Street
in the BigQuery to be "N/A"
or null
and don't want pipeline to fail because of schema change or missing data.
How can I handle this logic in my dataflow job before writing to BigQuery in Python?