I have developed a Scala Spark application for streaming data directly into Google BigQuery, using the spark-bigquery connector by Spotify.
Locally it works correctly, I have configured my application as described here https://github.com/spotify/spark-bigquery
val ssc = new StreamingContext(sc, Seconds(120))
val sqlContext = new SQLContext(sc)
sqlContext.setGcpJsonKeyFile("/opt/keyfile.json")
sqlContext.setBigQueryProjectId("projectid")
sqlContext.setBigQueryGcsBucket("gcsbucketname")
sqlContext.setBigQueryDatasetLocation("US")
but when I submit the application on my Spark on YARN cluster the job fails looking for GOOGLE_APPLICATION_CREDENTIALS environment variable...
The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials.
I set the variable as OS env var for root user to the .json file containing the credentials required, but it still fails.
I have also tried with the following line
System.setProperty("GOOGLE_APPLICATION_CREDENTIALS", "/opt/keyfile.json")
without success.
Any idea on what I'm missing?
Thank you,
Leonardo