4
votes

My Dataflow pipeline needs to read a resource file GeoLite2-City.mmdb. I added it to my project and ran the pipeline. I confirmed that the project package zip file exists in the staging bucket on GCS.

However, when I try to read the resource file GeoLite-City.mmdb, I get a FileNotFoundException. How can I fix this? This is my code:

String path = myClass.class.getResource("/GeoLite2-City.mmdb").getPath();

File database = new File(path);

try
{

DatabaseReader reader = new DatabaseReader.Builder(database).build(); //<-this line get a FileNotFoundException

}

catch (IOException e)

{

LOG.info(e.toString());

}

My project package zip file is "classes-WOdCPQCHjW-hRNtrfrnZMw.zip" (it contains class files and GeoLite2-City.mmdb)

The path value is "file:/dataflow/packages/staging/classes-WOdCPQCHjW-hRNtrfrnZMw.zip!/GeoLite2-City.mmdb", however it cannot be opened.

and This is the options.

--runner=BlockingDataflowPipelineRunner 
--project=peak-myproject 
--stagingLocation=gs://mybucket/staging 
--input=gs://mybucket_log/log.68599ca3.gz

The Goal is transform the log file on GCS, and insert the transformed data to BigQuery. When i ran locally, it was success importing to Bigquery. i think there is a difference local PC and GCE to get the resource path.

1
Does this run locally using the DirectPipelineRunner? - Graham Polley
Also can you confirm whether your DatabaseReader class supports files located inside zip archives at all? That's independent on Dataflow - you can just try to create the DatabaseReader in your main program and point it at a local copy of the classes-WOdCPQCHjW-hRNtrfrnZMw.zip file, and check if it works. - jkff
No, The runner is BlockingDataflowPipelineRunner. When i ran locally using the DirectPipelineRunner, it worked well. the path value is locally "/C:/Users/Jennie/workspace/DataflowJavaSDK-master/eclipse/starter/target/classes/GeoLite2-City.mmdb" and this is my option. [ --runner=BlockingDataflowPipelineRunner --project=peak-myproject --stagingLocation=gs://mybucket/staging --input=gs://mybucket_log/log.68599ca3.gz ] - olivia

1 Answers

2
votes

I think the issue might be that DatabaseReader does not support paths to resources located inside a .zip or .jar file.

If that's the case, then your program worked with DirectPipelineRunner not because it's direct, but because the resource was simply located on the local filesystem rather than within the .zip file (as your comment says, the path was C:/Users/Jennie/workspace/DataflowJavaSDK-master/eclipse/starter/target/classe‌​s/GeoLite2-City.mmdb, while in the other case it was file:/dataflow/packages/staging/classes-WOdCPQCHjW-hRNtrfrnZMw.zip!/GeoLite2-City.mmdb)

I searched the web for what DatabaseReader class you might be talking about, and seems like it is https://github.com/maxmind/GeoIP2-java/blob/master/src/main/java/com/maxmind/geoip2/DatabaseReader.java .

In that case, there's a good chance that your code will work with the following minor change:

try
{
  InputStream stream = myClass.class.getResourceAsStream("/GeoLite2-City.mmdb");
  DatabaseReader reader = new DatabaseReader.Builder(stream).build();
}
catch (IOException e)
{
  ...
}