0
votes

I'm trying to use Google Refine to extract postcodes from Google Maps API JSON.

I added a new column by fetching URLs:

"http://maps.googleapis.com/maps/api/geocode/json?sensor=false&address=" + escape(value, "url")

Then the resulting JSON is as follows:

{ "results" : [ { "address_components" : [ { "long_name" : "44", "short_name" : "44", "types" : [ "street_number" ] }, { "long_name" : "Homer Street", "short_name" : "Homer St", "types" : [ "route" ] }, { "long_name" : "London", "short_name" : "London", "types" : [ "locality", "political" ] }, { "long_name" : "Greater London", "short_name" : "Gt Lon", "types" : [ "administrative_area_level_2", "political" ] }, { "long_name" : "United Kingdom", "short_name" : "GB", "types" : [ "country", "political" ] }, { "long_name" : "W1H 4NW", "short_name" : "W1H 4NW", "types" : [ "postal_code" ] }, { "long_name" : "London", "short_name" : "London", "types" : [ "postal_town" ] } ], "formatted_address" : "44 Homer Street, London, Greater London W1H 4NW, UK", "geometry" : { "location" : { "lat" : 51.51981750, "lng" : -0.16534040 }, "location_type" : "ROOFTOP", "viewport" : { "northeast" : { "lat" : 51.52116648029151, "lng" : -0.1639914197084980 }, "southwest" : { "lat" : 51.51846851970851, "lng" : -0.1666893802915020 } } }, "types" : [ "street_address" ] } ], "status" : "OK" }

After browsing through a few blogs to find the relevant code, I then tried transforming the column using this...

value.parseJson().results[0]["formatted_address"]

...which works great for the full address.

The problem occurs when I try to extract the postcode. I tried fiddling around and got nowhere, then I downloaded JSONPad and pasted the JSON into a tree map to get the path:

value.parseJson().results[0]["address_components"][5]["long_name"]

The problem is that this extracts the postcode perfectly for some entries, and not so perfectly for others, where it extracts something else - town or country, for example.

Changing the [5] to [6] seems to extract the postcodes for the other addresses, but is there a way to extract ONLY the postcode, regardless of where it falls in the structure?

Any help much appreciated!

2

2 Answers

0
votes

What you'd probably have to do is loop over the structs in the address_components array, checking the "types" of each one. When types contains "postal_code", then tada, that's your postcode.

Something like the following code (which worked for me):

<script type="text/javascript">
var stuData = 
{
    "results": [{
            "address_components": [{
                    "long_name": "44",
                    "short_name": "44",
                    "types": ["street_number"]
                }, {
                    "long_name": "Homer Street",
                    "short_name": "Homer St",
                    "types": ["route"]
                }, {
                    "long_name": "London",
                    "short_name": "London",
                    "types": ["locality", "political"]
                }, {
                    "long_name": "Greater London",
                    "short_name": "Gt Lon",
                    "types": ["administrative_area_level_2", "political"]
                }, {
                    "long_name": "United Kingdom",
                    "short_name": "GB",
                    "types": ["country", "political"]
                }, {
                    "long_name": "W1H 4NW",
                    "short_name": "W1H 4NW",
                    "types": ["postal_code"]
                }, {
                    "long_name": "London",
                    "short_name": "London",
                    "types": ["postal_town"]
                }
            ],
            "formatted_address": "44 Homer Street, London, Greater London W1H 4NW, UK",
            "geometry": {
                "location": {
                    "lat": 51.51981750,
                    "lng": -0.16534040
                },
                "location_type": "ROOFTOP",
                "viewport": {
                    "northeast": {
                        "lat": 51.52116648029151,
                        "lng": -0.1639914197084980
                    },
                    "southwest": {
                        "lat": 51.51846851970851,
                        "lng": -0.1666893802915020
                    }
                }
            },
            "types": ["street_address"]
        }
    ],
    "status": "OK"
};

var myPostcode;

for (var i = 0; i < stuData.results[0].address_components.length; i++) {
    for (var j = 0; j < stuData.results[0].address_components[i].types.length; j++) {
        if (stuData.results[0].address_components[i].types[j] == "postal_code") {
            myPostcode = stuData.results[0].address_components[i].long_name;
            break;
        }
    }
}

console.log(myPostcode);
</script>
0
votes

You could translate duncan's answer from Javascript to Python and use it with the Jython interpreter or you could use the following GREL expression (which I posted to Google Groups in response to your query):

filter(forEach(value.parseJson().results[0].address_components,c,if(c.types[0]=='postal_code',c.long_name,'')),v,v!='')[0]

Of course, this all presumes that you're using this data for display on a Google Map, per the API's Terms of Service.