0
votes

I have a hive table with ip_address column. How can I find country, city and Zip code from that ip_address column?

I see a udf written:

https://github.com/edwardcapriolo/hive-geoip

How do I utilize udf in hive? Can I create function name myself?

The udf says we need separate database:

http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz

How do I implement that database on Hive?

Any feedback will be appreciated.

Thanks,

Rio

1

1 Answers

0
votes

You utilize UDFs in Hive by adding the jars and creating temporary functions as described by your first link.

add file GeoIP.dat;
add jar geo-ip-java.jar;
add jar hive-udf-geo-ip-jtg.jar;
create temporary function geoip as 'com.jointhegrid.hive.udf.GenericUDFGeoIP';

You may change the function name to whatever you would prefer, simply replace the word after "temporary function" from "geoip" to whatever you want.

Adding the database you linked to is a matter of downloading it to your unix server and then unzipping it using gzip. Once it is in the GeoIP.dat format, move it and the jars you've downloaded into the your /users/(your username)/ directory and then run the code as instructed above. The files must be in your top directory or else explicitly targeted during your add file and add jar statements. by that I mean instead of add file GeoIP.dat; it must be add file /users/wertz/downloads/GeoIP.dat; for example.

Finally, by looking at the code the UDF needs three arguments. The first argument is the IP address, the second argument is what you're looking for (choices appear to be COUNTRY_NAME, COUNTRY_CODE, AREA_CODE, CITY, DMA_CODE, LATITUDE, LONGITUDE, METRO_CODE, POSTAL_CODE, REGION, ORG, or ID) and the final value is the filename of the GeoIP database, which hopefully you have not changed from GeoIP.dat