0
votes

This night non of our Google Storage Transfer Jobs completed. They either got stuck at "Calculating..." or 0% progress.

We've are using Google Storage Transfer Jobs to transfer data between S3 to Google Cloud Storage (GCS) as a step in our data pipeline. We have set up a daily transfer job for a bunch of buckets and files. However, this night nothing was completed.

To troubleshoot it we tried to cancel all the existing jobs, and then create a new job that transferred a file from one bucket in GCS to another one. That one too got stuck on "Calculating..."Screenshot of test job that gets stuck

Has anyone experienced anything similar, and what's the solution to make it work again?

1
There seems to be an outage presently reported for Google Storage transfer jobs, which you may have been affected by. I recommend that you report the issue by opening an issue with the GCP engineers.oakinlaja

1 Answers

0
votes

Since we didn't know how long the outage mentioned by Oakinlaja we decided to instead find another solution for this.

We already had a couple of spark tasks that cleaned our data, so we rewrote those tasks to also transfer the data from S3 to GCS like it is explained in this article.

We basically set up the AWS config (note that we needed to use spark.sparkContext since we can't use both SparkContext and SparkSession simultaneously):

val accessKeyId = System.getenv("AWS_ACCESS_KEY_ID")
val secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", accessKeyId)
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secretAccessKey)

and then read the data directly from S3:

val data_frame = spark.read.json("s3n://bucket/file-prefix/" + LocalDateTime.now.minusHours(24).format(DateTimeFormatter.ofPattern("YYYY/MM/dd")) + "/*/*.gz")

After this we flatten the data and some other steps before we write it to GCS, and then in another step upload the data to BigQuery.

With that said, it's really weird that there is so hard to get any information about this kind of outage and no information on if they are working on it or not.