1
votes

I am using Python 3 and the Google Geocoding API to gather address information. I am having a difficult time consistently parsing out the address components that I am getting back. The JSON structure for the 'address_components' section does not seem to follow standard json formatting, and I can't find any mention of how to pop values out of a list by their names, Maybe I need to force the 'address_components' list sections to behave like dictionaries? Specifically, my problem occurs in situations similar to below:

import urllib.request
import urllib.parse
import json

url = "http://maps.googleapis.com/maps/api/geocode/json?" + urllib.parse.urlencode({"sensor": "false", "output": "more", "address": "The White House"})
uo = urllib.request.urlopen(url)
data = uo.read().decode()
js = json.loads(str(data))
print('Full js:', js)
# which nets good json text as inserted in-line below:

Full js: {'results': [{'formatted_address': 'The White House, 1600 Pennsylvania Ave NW, Washington, DC 20500, USA', 'geometry': {'location': {'lng': -77.0365298, 'lat': 38.8976763}, 'viewport': {'southwest': {'lng': -77.0378787802915, 'lat': 38.8963273197085}, 'northeast': {'lng': -77.0351808197085, 'lat': 38.8990252802915}}, 'location_type': 'APPROXIMATE'}, 'address_components': [{'long_name': 'The White House', 'types': ['point_of_interest', 'establishment'], 'short_name': 'The White House'}, {'long_name': '1600', 'types': ['street_number'], 'short_name': '1600'}, {'long_name': 'Pennsylvania Avenue Northwest', 'types': ['route'], 'short_name': 'Pennsylvania Ave NW'}, {'long_name': 'Northwest Washington', 'types': ['neighborhood', 'political'], 'short_name': 'Northwest Washington'}, {'long_name': 'Washington', 'types': ['locality', 'political'], 'short_name': 'Washington'}, {'long_name': 'District of Columbia', 'types': ['administrative_area_level_1', 'political'], 'short_name': 'DC'}, {'long_name': 'United States', 'types': ['country', 'political'], 'short_name': 'US'}, {'long_name': '20500', 'types': ['postal_code'], 'short_name': '20500'}], 'partial_match': True, 'place_id': 'ChIJ37HL3ry3t4kRv3YLbdhpWXE', 'types': ['point_of_interest', 'establishment']}], 'status': 'OK'}

# Thanks to the json library (I believe), I can pull data out of the above as if it were a mixed list & dictionary by using named references and indexed locations:
rte = js['results'][0]['address_components'][2]
print("Route:", rte)

Route: {'long_name': 'Pennsylvania Avenue Northwest', 'types': ['route'], 'short_name': 'Pennsylvania Ave NW'}

# unfortunately though, the lists change structure (after the initial 'results' category) and so the route is not always the third element, as seen below:
url2 = "http://maps.googleapis.com/maps/api/geocode/json?" + urllib.parse.urlencode({"sensor": "false", "output": "more", "address": "The Breakers"})
uo2 = urllib.request.urlopen(url2)
data2 = uo2.read().decode()
js2 = json.loads(str(data2))
address_components2 = js2['results'][0]['address_components'][2]
print("address_components2:", address_components2)

address_components2: {'long_name': 'Suffolk County', 'types': ['administrative_area_level_2', 'political'], 'short_name': 'Suffolk County'}

Is there any way around this issue? How can I get Route always for 'route'?

1

1 Answers

2
votes

As discussed and shown in Python example in Processing JSON with Javascript, json.load() was used in parsing then results were displayed in formatted_address values to the user within an array. And, to get only the data you need, you can use filter() function from the Python Built-in Functions whenever the JSON response contains multiple values.

Helpful explanation regarding processing JSON responses can also be found in this related SO post - How to reverse geocode serverside with python, json and google maps?.