In order to find the country for a specific IP address, I'm using the Maxmind IP address database. I've downloaded the database and imported it into Google BigQuery, so I can query it. In a separate table, I'm capturing IP-addresses from certain events in our systems. I would now like to join those two data sources.
The columns in the Maxmind database are as follows:
- start_ip_range STRING NULLABLE
- end_ip_range STRING NULLABLE
- start_ip_num STRING NULLABLE
- end_ip_num STRING NULLABLE
- country_code STRING NULLABLE
- country_name
The columns in my event table are:
- request_id STRING NULLABLE
- ip_address STRING NULLABLE
As documented here (https://dev.maxmind.com/geoip/legacy/csv/) there is a way to get an integer representation of the ip address, so I can use it to query the ip address and retrieve the country_code or country_name.
I have now constructed the following query:
SELECT
p.*,
g.country_code AS country_code
FROM
`dev.event_v1` p
INNER JOIN
`dev.geo_ip_countries` g
ON
SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(0)] AS NUMERIC)*16777216 +
SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(1)] AS NUMERIC)*65536 +
SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(2)] AS NUMERIC)*256 +
SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(3)] AS NUMERIC)
BETWEEN
SAFE_CAST(g.start_ip_num AS INT64)
AND
SAFE_CAST(g.end_ip_num AS INT64)
LIMIT 100
And however this works when using a limit, it doesn't for constructing a view.
Two questions: 1. Is there a way to simplify the query 2. Google BigQuery throws an error when I try to return a large result set:
Error: Query exceeded resource limits. 28099.974050246612 CPU seconds were used, and this query must use less than 5600.0 CPU seconds.
Any help is appreciated!
Solution Rewriting the query to the following worked and solved the resource limit issue too:
SELECT
p.*,
g.country_code
FROM
`dev.event_v1` p
INNER JOIN
`dev.geo_ip_countries` g
ON
NET.IP_TRUNC(NET.SAFE_IP_FROM_STRING(p.ip_address),16) = NET.IP_TRUNC(NET.SAFE_IP_FROM_STRING(g.start_ip_range),16)
WHERE
NET.SAFE_IP_FROM_STRING(p.ip_address)
BETWEEN
NET.SAFE_IP_FROM_STRING(g.start_ip_range)
AND
NET.SAFE_IP_FROM_STRING(g.end_ip_range)