3
votes

A part of our application initiates transfers from Amazon S3 to Google Cloud Storage via the storage transfer service API. We've had this up and running successfully for several months until yesterday when our transfers stopped working. We can see a transfer was initiated in the console, but it hangs indefinitely with a single history item stating: "This transfer is starting..." We have a background process polling the transfer status which we see returning a status of "transfer_calculating"

In trying to debug this issue, we setup a transfer via the storage console. We used the same AWS access key id/secret access key used by our application, and the transfer completed successfully. This leads us to believe the issue is isolated to the transfer service API or our code that initiates the API call.

Transfer Job Code:

TransferJob tjob = new TransferJob()
    .setDescription(description)
    .setStatus('ENABLED')
    .setProjectId(transferGoogleProject)
    .setTransferSpec(
    new TransferSpec()
        .setGcsDataSink(new GcsData().setBucketName(googleStorageBucket))
        .setAwsS3DataSource(
        new AwsS3Data()
            .setBucketName(s3Bucket)
            .setAwsAccessKey(new AwsAccessKey().setAccessKeyId(transferAwsKey).setSecretAccessKey(transferAwsSecret)))
        .setObjectConditions(new ObjectConditions().setIncludePrefixes(s3Keys))
        .setTransferOptions(
        new TransferOptions()
            .setDeleteObjectsFromSourceAfterTransfer(false)
            .setOverwriteObjectsAlreadyExistingInSink(true)
            .setDeleteObjectsUniqueInSink(false)))
    .setSchedule(
    new Schedule()
        .setScheduleStartDate(date)
        .setScheduleEndDate(date)
        .setStartTimeOfDay(time))

tjob = storagetransfer.transferJobs().create(tjob).execute()

Library configuration:

<dependency>
    <groupId>com.google.api-client</groupId>
    <artifactId>google-api-client</artifactId>
    <version>1.19.1</version>
</dependency>
<dependency>
    <groupId>com.google.apis</groupId>
    <artifactId>google-api-services-bigquery</artifactId>
    <version>v2-rev191-1.19.1</version>
</dependency>
<dependency>
    <groupId>com.google.apis</groupId>
    <artifactId>google-api-services-storage</artifactId>
    <version>v1-rev26-1.19.1</version>
</dependency>
<dependency>
    <groupId>com.google.apis</groupId>
    <artifactId>google-api-services-storagetransfer</artifactId>
    <version>v1-rev3-1.19.1</version>
</dependency>
<dependency>
    <groupId>com.google.oauth-client</groupId>
    <artifactId>google-oauth-client</artifactId>
    <version>1.19.0</version>
</dependency>
<dependency>
    <groupId>com.google.http-client</groupId>
    <artifactId>google-http-client</artifactId>
    <version>1.19.0</version>
</dependency>
<dependency>
    <groupId>com.google.http-client</groupId>
    <artifactId>google-http-client-jackson2</artifactId>
    <version>1.19.0</version>
</dependency>

We've bumped the versions up 1.21.0 in our development environment but the transfers still get stuck at "This transfer is starting..."

At this point we're stuck, anyone else running into this issue?

4
We apologize for the difficulty and are investigating this issue. Could you please send your project IDs and exact queries to: [email protected] so we can look into your exact case? Thanks! -MayurMayur Deshpande

4 Answers

3
votes

From feedback provided by @mayur-deshpande at Google (thanks!), our issue stems from the time value passed to setStartTimeOfDay() needing to be in UTC. Up to this point, we've used US/Pacific which is prescribed in the following snippet from the creating transfers development guide

/**
 * Specify times below using US Pacific Time Zone.
 */
private static final String START_DATE = "YYYY-MM-DD";
private static final String START_TIME = "HH:MM:SS";

Due to the time difference, the time we sent in our request had already passed, so the transfer sat in started status until our time was hit the next day. We saw this happen as the requests did eventually complete.

The javadoc for setStartTime() of day does point out the need to use UTC:

/**
 * The time in UTC at which the transfer will be scheduled to start in a day. Transfers may start
 * later than this time. If not specified, transfers are scheduled to start at midnight UTC.
 * @param startTimeOfDay startTimeOfDay or {@code null} for none
 */
public Schedule setStartTimeOfDay(TimeOfDay startTimeOfDay)

The example code referenced above should reflect this requirement, so I'll file an issue in the github repo.

Also, since we are only doing one time transfers, we made sure to set our start time one minute in the future to account for subtle time differences between servers:

DateTime now = new DateTime().plusMinutes(1)
Date date = new Date().setDay(now.dayOfMonth).setMonth(now.monthOfYear).setYear(now.year)
TimeOfDay time = new TimeOfDay().setHours(now.hourOfDay).setMinutes(now.minuteOfHour).setSeconds(0)
2
votes

Note that in the API the StartTimeOfDay is in reference of UTC (please see the API reference: https://cloud.google.com/storage/transfer/reference/rest/v1/transferJobs#Schedule). The request you emailed to us specified an hour that was in the past based on UTC. Please specify a StartTimeOfDay in UTC. If you just want to start a one-off job run immediately, just leaving the StartTimeOfDay field empty would work.

Please also note that the Google Cloud UI lets customers specify a StartTimeOfDay in local timezone, which is different from the API.

0
votes

I am also having this problem. Transfer submissions via the Google Storage web page work but going through the api has stopped working entirely.

I even tried using the Google Storage Oauth web page to try to submit and it fails in the same way.

0
votes

I found the Storage Transfer Service to be very challenging to get working. There were many nuances. It would be very helpful to clean up the doc and get all the info in one easy to find place in a clear way.

An overview for anyone after me:

1) Create a service account

2) Via IAM, give the service account the role Project->Editor

3) Start with the sample code on github.com/GoogleCloudPlatform/java-docs-samples/blob/master/storage/storage-transfer/src/main/java/com/google/cloud/storage/storagetransfer/samples/

  • 3a) Only need to concern yourself three of the classes: AwsRequester, TransferJobUtils & RetryHttpInitializerWrapper

    3b) Create a Storagetransfer client, need the google-api-services-storagetransfer jar, to connect with your secret JSON file

  • 3c) Create a TransferJob object to submit the job, this is where you set the projectId and Schedule where you will need to at least set the date and leave the time null for immediate submission but you do need to create a schedule

    3d) Create a TransferSpec object which will have the bulk of your configuration options. The mappings to the options are not clear. You will need to use the JSON API doc, https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec, and match the names in the Java API. At a minimum, you will want to
    populate: AWS bucket & credentials, GCS sink, etc... The sample gives you this. But you will also want to set ObjectConditions.setIncludePrefixes with a list of strings, the same way you via the UI.

4) Don't even bother trying to make sense of the main page, cloud.google.com/storage/transfer/create-client, read it only once. Its just not that helpful for really making it work.

HTH?