11
votes

I have written a spark job on my local machine which reads the file from google cloud storage using google hadoop connector like gs://storage.googleapis.com/ as mentioned in https://cloud.google.com/dataproc/docs/connectors/cloud-storage

I have set up service account with compute engine and storage permissions. My spark configuration and code is

SparkConf conf = new SparkConf();
conf.setAppName("SparkAPp").setMaster("local");
conf.set("google.cloud.auth.service.account.enable", "true");
conf.set("google.cloud.auth.service.account.email", "[email protected]");
conf.set("google.cloud.auth.service.account.keyfile", "/root/Documents/xxx-compute-e71ddbafd13e.p12");
conf.set("fs.gs.project.id", "xxx-990711");
conf.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem");
conf.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"); 

SparkContext sparkContext = new SparkContext(conf);
JavaRDD<String> data = sparkContext.textFile("gs://storage.googleapis.com/xxx/xxx.txt", 0).toJavaRDD();
data.foreach(line -> System.out.println(line));

I have set up environment variable also named GOOGLE_APPLICATION_CREDENTIALS which points to the key file. I have tried using both key files i.e. json & P12. But unable to access the file. The error which I get is

java.net.UnknownHostException: metadata
java.io.IOException: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token
        at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:208)
        at com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:70)

I am running my job from eclipse with java 8, spark 2.2.0 dependencies and gcs-connector 1.6.1.hadoop2 . I need to connect only using service account and not by OAuth mechanism.

Thanks in advance

1
Have you tried to set your parameters in sparkContext.hadoopConfiguration instead of the SparkConf ?Alexandre Dupriez

1 Answers

3
votes

Are you trying it locally? If yes then you need to set the environment variable GOOGLE_APPLICATION_CREDENTIALS to your key.json or set it to HadoopConfiguration instead of setting it to SparkConf like:

    Configuration hadoopConfiguration = sparkContext.hadoopConfiguration();
    hadoopConfiguration.set("google.cloud.auth.service.account.enable", true);
    hadoopConfiguration.set("google.cloud.auth.service.account.email", "[email protected]");
    hadoopConfiguration.set("google.cloud.auth.service.account.keyfile", "/root/Documents/xxx-compute-e71ddbafd13e.p12");