Which java google cloud library for bigquery and dataproc combo?

Question

I'm a little confused about which google cloud java libraries I have to use in my java spark application submitted to google dataproc.

In my application I have to use different google cloud services. In the bigquery documentation, for example, I found that I have to use

<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-bigquery</artifactId>
  <version>0.32.0-beta</version>
</dependency>

while for google storage I have to use

<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-storage</artifactId>
  <version>1.14.0</version>
</dependency>

and so on with other google cloud services.

But if I use these libraries on dataproc I have some problems like the conflict problem with guava library (see here: NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor conflits on Elastic Search jar).

Finally I found the "Umbrella package"

<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud</artifactId>
    <version>0.8.0</version>
</dependency>

With this package I have no need to use libraries for any single google cloud service. Just one library for all services and no more conflict problems.

Ok but the web page of the Umbrella package (https://github.com/GoogleCloudPlatform/google-cloud-java/tree/master/google-cloud) saids:

"This package does not have guaranteed stability and may experience backwards-incompatible changes."

So, are the Umbrella package up to date with the features of the others google cloud services?

Umbrella package represent the more convenient way to use different google cloud services on dataproc?

More generally: which is the best approach when I want to use different google cloud services in a single application and avoid conflict between dependences on libraries with different versions (guava, gax and so on)?

Guillem Xercavins Guillem Xercavins · Accepted Answer · 2018-04-02T08:58:06

The umbrella package seems to be updated frequently but, to me, this looks like a trade-off according to your needs. Micromanaging the dependencies will allow for finer control but you'll need to solve conflicts by manually excluding some libraries (as in the link you posted). Using BOM/Umbrella is more convenient but libraries will use controlled versions that might be temporarily out of date. I would just use this simpler approach unless you need a very specific version of a library or a combination not found in the umbrella one (i.e. you want to fix the BigQuery one but keep updating the rest). The stability warning does not affect the individual dependencies.

Which java google cloud library for bigquery and dataproc combo?

1 Answers