I'm trying, in a Java environment, to write log files to Google Cloud Storage in chunks. I have a process that parses raw log-files and produces lines of JSON; I store the JSON lines in a buffer, and I want to write to the same file in GCS every time the buffer hits 5mgb or so, until the original raw source has been fully parsed. I have a similar setup that writes to AWS S3. The writing in chunks is done due to memory issues.
I managed to write a file to GCS as follows (gcsService is a Storage object configured with authentications and so on):
private void uploadStream(String path, String name, String contentType, InputStream stream, String bucketName) throws IOException, GeneralSecurityException {
InputStreamContent contentStream = new InputStreamContent(contentType, stream);
StorageObject objectMetadata = new StorageObject()
.setName(path+"/"+name)
.setAcl(Arrays.asList(new ObjectAccessControl().setEntity("allUsers").setRole("READER")));
Storage.Objects.Insert insertRequest = gcsService.objects()
.insert(bucketName, objectMetadata, contentStream);
insertRequest.execute();
}
Unfortunately, I have been unable to figure out how to write to GCS in chunks. Google's documentation seems to suggest two approaches. One involves "Resumable" Insert requests: https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload
And the other approach involves "Compose" requests: https://cloud.google.com/storage/docs/json_api/v1/objects/compose
I've been trying to get a "Resumable" upload set up, but I can't get it to work.
Any ideas? My specific questions are:
- What is an elegant and/or appropriate way to upload in chunks to GCS?
- Does anyone know how to set up Resumable uploads to GCS via Insert requests in Java? Can that be done at all?