I have a lot of avro files with IPs (weblogs) stored in Azure Blobs. I would like to map the IP to location. How to do that with Azure Data Lake Analytics (ADLA)?
Right now I have a spark job that uses the Maxmind IP database with a java library that reads a 113MB big .mmdb file with all the ip-location data to do this lookup. I am now investigating if its possible to move this job over to ADLA
Maxmind provies a c# library as well, so that part is no problem. However, its not obvious to me how to deal with this big mmdb file that needs to be read and then used for the lookups. Obviously it would not be fast to read the file for each IP lookup. How can this (and similar cases) be handled with ADLA, or is ADLA unsuitable for this kinds of operations?
If I had a normal program running I would do the lookup like this:
using (var reader = new Reader("GeoIP2-City.mmdb"))
{
foreach(var ip in ips)
{
var data = reader.Find<Dictionary<string, object>>(ip);
...
}
}
The maxmind db is available here: https://dev.maxmind.com/geoip/geoip2/downloadable/ (note that I have bought the database Im currently using) and the c# library to read it here: https://github.com/maxmind/MaxMind-DB-Reader-dotnet