0
votes

I'm trying to move a Ubuntu 17.10.1 directory to google cloud storage bucket via a node js app. I've chosen to execute a shell script using the child process module. The script look like this.

#!/bin/bash

echo START $1
declare -r MODEL_NAME=$1
declare -r PROJECT_ID=XXX-XXXX
declare -r JOB_NAME="${MODEL_NAME}_$(date +%Y%m%d_%H%M%S)"
declare -r BUCKET_NAME="gs://XXX-XXXX-mlengine"
declare -r GCS_PATH="$BUCKET_NAME/$JOB_NAME"
gsutil -m cp -r ./training/$MODEL_NAME $GCS_PATH
echo ALL DONE!

This works as expected when called from a terminal.

When I call it from my node app it says it can't find gsutil and returns an error. I'm using the child_process execFile to do the external shelling. If I comment out the gsutil line all is well. I've tried using "wait" to no avail. Prior efforts with the child_proccess_promise module produced and error message saying it could not find gsutil.

const child = execFile('tensor_flow/file_process.sh', [trainingName], (error, stdout, stderr) => {
    if (error)  console.log(error);
    if (stderr)  console.log(stderr);    
    if (stdout) console.log(stdout);
  });

This produces these error messages:

tensor_flow/file_process.sh: line 12: gsutil: command not found
training-prep.js:26
tensor_flow/file_process.sh: line 14: wait: `PID': not a pid or valid job spec
START T2
training-prep.js:27
ALL DONE!

Any help or insight would be appreciated or ... if you could point me to a "move the directory to a bucket" via node directly I'd like to know about that.

Thanks, JJ

PS. I've sudo'd all of the above in the course of this effort and I've done this with and without the wait/PID stuff.

1
Your problem is in gsutil: command not found. Maybe because (1) gsutil doesn't exist in google cloud's server or (2) the path is not found. If 2 is the case, you need to supply absolute path of gsutil.ariefbayu
Thanks for the effort. In the end it turns out that I need to launch code from the terminal to get this to work in my project directory. Otherwise it has an environment otherwise it has an environment from the Ubuntu UI.jack johnson

1 Answers

3
votes

If you are using NODE.js you can use the Google Cloud Storage Node.js Library.

You should use the createWriteStream method to upload a file to your GCS bucket:

var fs = require('fs');
var storage = require('@google-cloud/storage')();
var myBucket = storage.bucket('my-bucket');

var file = myBucket.file('my-file');

//-
// <h4>Uploading a File</h4>
//
// Now, consider a case where we want to upload a file to your bucket. You
// have the option of using {@link Bucket#upload}, but that is just
// a convenience method which will do the following.
//-
fs.createReadStream('/Users/stephen/Photos/birthday-at-the-zoo/panda.jpg')
  .pipe(file.createWriteStream())
  .on('error', function(err) {})
  .on('finish', function() {
    // The file upload is complete.
  });

If you want to make a parallel upload using gsutil command being executed by node.js you can follow this thread. There I combined the third solution by @hexacyanide with the official gsutil docs and it worked for me:

const { exec } = require('child_process');
exec('gsutil -m cp -r /full_path_to_your_directory gs://your-bucket', (err, stdout, stderr) => {
 if (err) {
   // node couldn't execute the command
   return;
 }
 // the *entire* stdout and stderr (buffered)
 console.log(`stdout: ${stdout}`);
 console.log(`stderr: ${stderr}`);
});

The child_process package is a Node.js native module.