1
votes

It is taking 40+ minutes to move a 20G file to a google bucket using the java storage api. It took 4 minutes when using gsutil cp. Any idea where I might be going wrong with the java storage api?

First attempt with Java API.

    BlobInfo blobInfo = null;
    try (BufferedInputStream inputStream = new BufferedInputStream(new FileInputStream(fileToUpload))) {
        blobInfo =
            BlobInfo.newBuilder(bucketName, bucketFilePath)
                .setContentType("application/octet-stream")
                .setContentDisposition(String.format("attachment; filename=\"%s\"", bucketFilePath))
                .setMd5(fileToUploadMd5)
                .build();
        try (WriteChannel writer = storage.writer(blobInfo, Storage.BlobWriteOption.md5Match())) {
            ByteStreams.copy(inputStream, Channels.newOutputStream(writer));
        }
    } catch (StorageException ex) {
        if (!(400 == ex.getCode() && "invalid".equals(ex.getReason()))) {
            throw ex;
        }
    }

Second Attempt with Java API

    BlobInfo blobInfo =
        BlobInfo.newBuilder(bucketName, bucketFilePath)
            .setContentType("application/octet-stream")
            .setContentDisposition(String.format("attachment; filename=\"%s\"", bucketFilePath))
            .setMd5(fileToUploadMd5)
            .build();

    // Write the file to the bucket
    writeFileToBucket(storage, fileToUpload.toPath(), blobInfo);

private void writeFileToBucket(Storage storage, Path fileToUpload, BlobInfo blobInfo) throws Exception {
    // Code from : https://github.com/googleapis/google-cloud-java/blob/master/google-cloud-
    // examples/src/main/java/com/google/cloud/examples/storage/StorageExample.java
    if (Files.size(fileToUpload) > 1_000_000) {
        // When content is not available or large (1MB or more) it is recommended
        // to write it in chunks via the blob's channel writer.
        try (WriteChannel writer = storage.writer(blobInfo)) {
          byte[] buffer = new byte[1024];
          try (InputStream input = Files.newInputStream(fileToUpload)) {
            int limit;
            while ((limit = input.read(buffer)) >= 0) {
              try {
                writer.write(ByteBuffer.wrap(buffer, 0, limit));
              } catch (Exception ex) {
                ex.printStackTrace();
              }
            }
          }
        }
      } else {
        byte[] bytes = Files.readAllBytes(fileToUpload);
        // create the blob in one request.
        storage.create(blobInfo, bytes);
      }
}

Both Java API attempts took 40+ minutes.

gsutil code

gcloud auth activate-service-account --key-file serviceAccountJsonKeyFile

gsutil cp fileToUpload gs://google-bucket-name

1

1 Answers

1
votes

GSutil has built in feature for optimizing the large file upload, especially by splitting it and by sending in parallel several parts to optimize the bandwidth.

More detail here

Similar features are hard to implement.