How to upload a large file (> 1GB) directly to Amazon S3 from a given URL using NodeJS without actually saving the file in local?

votes

I tried uploading a large file to S3 in many ways in NodeJS using aws-sdk but eventually ended up just uploading only 1mb of a file which is actually 1.2 GB.

So, I tried using streaming-s3 in node js, the code is shown below.

I referred https://www.npmjs.com/package/streaming-s3 for streaming-s3 package.

var streamingS3 = require('streaming-s3');
var request = require('request');
var url = 'XXXXXXXXX';
var rStream = request.get(url);

var uploader = new streamingS3(
  rStream,
  {accessKeyId: 'XXXXXXXXXXXXX',
    secretAccessKey: 'XXXXXXXXX'
  }, 
 {
    Bucket: 'XXXXXXXXX',
    Key: 'XXXXXXX',
    ContentType: 'text/html'
  },

  {
    concurrentParts: 2,
    waitTime: 10000,
    retries: 1,
    maxPartSize: 10 * 1024 * 1024
  }
);

uploader.begin();

When I run this code, the chunks from the file are not actually getting uploaded to S3 in streams. Only 1mb is getting uploaded, but not the entire file. Is there any other way to upload a file to S3 from a URL ?

node.jsamazon-s3

3 Answers

votes

Try using s3-upload-stream. It uses Multipart Uploads. For objects larger than 100mb it is recommended to use Multipart Uploads.

votes

AWS has "managed upload" to take care of large file upload to S3. Tried a few mb of data and it gets uploaded albeit it takes a few seconds. S3-upload-stream does a great of optimzation and with larger parts (chunks) the optimzation is great. I hope that you arent using http to do that uploads, since 1GB files take a large amount of time to upload and http eventually times out, hence you couldn't see a response. The better way to do that, is to leverage the upload functionality to a worker and once done, send a notification through SSE (since you use nodejs - implementing SSE should be easier since its a javascript paradigm anyway). I can give a sample implementation with a few mb file here, but I urge to follow a different design incase of uploading 1GB files.

Here is an example that I use: THis is with managedUpload - 10mb arnd 20sec, 50 mb file takes around 32s, 80mb takes (6MB chunks - 16parts)

   @VideoFileValidator
async managedUploadFileToFolder(bucket: BucketModel, fileObj: any) {

    return new Promise(async (resolve, reject) => {
        // first make sure that the bucket exists
        try {
            const bucketList = await this.listBuckets();
            if (bucketList.filter(item => item.Name.toUpperCase() === bucket.name.toUpperCase()).length >= 1) {
                // means there is a bucket and obj can be created
                // create obj params
                const metadata: AWS.S3.Metadata = JSON.parse(JSON.stringify(bucket.fileData.metadata));
                const keyFolder = bucket && bucket.folder ? APP_CONSTANTS.COMMON.APP_NAME + '/' + bucket.folder + '/'+bucket.fileData.name : 'tf-default-video';

                const putObjParam: AWS.S3.PutObjectRequest = {
                    Body: Readable.from(this.utilService.bufferToStream(fileObj['buffer'])).pipe(zlib.createGzip()),
                    Bucket: bucket.name,
                    Key: keyFolder,
                    ContentType: 'multipart/form-data',
                    Metadata: metadata,
                    StorageClass: 'STANDARD'
                }
                this.s3MangedUpload = new AWS.S3.ManagedUpload({
                    leavePartsOnError: false,
                    partSize:1024*1024*6,
                    queueSize: 16,
                    params: putObjParam,
                    tags: this.utilService.createTags(false,true)
                });
                this.s3MangedUpload.send((err, uploadData) => {
                    console.log('aws err obtained:', err);
                    if(err) {
                        reject(AppConfigService.getCustomError('FID-S3-CUSTOM', `Error uploading the data:${err.message} - code: ${err.code} - status: ${err.statusCode}`));
                    }
                    resolve(plainToClass(CommonOutputResponse, {timestamp: AppUtilService.defaultISOTime(),
                        status: 'Success', message: `${fileObj['originalname']} Uploaded successfully with alias ${bucket.fileData.name}`,
                        data: {
                            uploadedToHttps: uploadData.Location,
                            uploadedToFolder: `s3://${bucket.name}//${bucket.folder}`
                        }
                    }));
                });
                this.s3MangedUpload.on('httpUploadProgress', (progress) => {
                    console.log('progress:', progress);
                });

            } else {
                reject(AppConfigService.getCustomError('FID-S3-CUSTOM', `Object cannot be created since ${bucket.name} doesnt exist`));
            }
        } catch (err) {
            console.log('err:', err);
            reject(AppConfigService.getCustomError('FID-S3-CUSTOM', 'Object cannot be uploaded -' + err.message));
        }

    });
}

Code explained: The decorator

@VideoFileValidator

validates if the input from http is a valid video file of types - mp4, mkv, avi (validates based on the file extension, so no way to read that large buffer)

The method by itself uses AWS.S3.managedUpload - defines the queue size as part of params and used the Bodyparam to define the stream Note: The stream here is the one coming from http input. comes as a buffer and hence converted to stream

.send() takes care of the actual streaming of data. I am unsure if piping happens in the backend, but I guess that might be the case. We can check for upload progress, using the 'httpUploadProgress' event.

The s3-upload-stream on the other hand performs better with larger chunk sizes. I tried with the same 10mb and 50 mb files with the below configuration and was able to save a few seconds, so hopefully it might interest you. Again, uploading 1GB file, might needed a better architecture design

@VideoFileValidator
    async streamFileUpload(bucket: BucketModel, fileObj: any){
        const metadata: AWS.S3.Metadata = JSON.parse(JSON.stringify(bucket.fileData.metadata));
        const keyFolder = bucket && bucket.folder ? APP_CONSTANTS.COMMON.APP_NAME + '/' + bucket.folder + '/'+bucket.fileData.name : 'tf-default-video';

        const putObjParam: AWS.S3.PutObjectRequest = {
            Bucket: bucket.name,
            Key: keyFolder,
            ContentType: 'multipart/form-data',
            Metadata: metadata,
            StorageClass: 'STANDARD'
        }

    return new Promise(async (resolve, reject) => {

        const read = this.utilService.bufferToStream(fileObj['buffer']);
        const compress = zlib.createGzip();
        const upload = this.s3Upload.upload(putObjParam);
        upload.maxPartSize(1024*1024*6); // 20 MB
        upload.concurrentParts(15);


        upload.on('part', function (details) {
            console.log('still uploading ....',details);
        });

        upload.on('uploaded', function (details) {
            console.log('upload completed',details);
            upload.end();
            resolve(plainToClass(CommonOutputResponse, {timestamp: AppUtilService.defaultISOTime(),
                status: '200-OK', message: 'Uploaded succesfully',
                data: {
                    location: details['Location'],
                    uploadedTo: details['Bucket'],
                    withName: details['Key']+''+bucket.fileData.name
                }
            }));
        });

        upload.on('error', (error) => {
            console.log('error occured upploading the data:', error);
            upload.end();
            reject(AppConfigService.getCustomError('FID-S3-CUSTOM', 'Error occured:'+error['message']));
        })

        read.pipe(compress).pipe(upload);
    })

    }

"Part" - event to check on progress, "uploaded" - event that is emitted when file is uploaded completely. I more or less used the same configuration

votes

I have found one repo with 2 solutions for file streaming:

https://github.com/transybao1393/file-streaming-solutions

Client splits to chunks and streaming file to server
Client using HTTP/HTTPS call to server and server getting file and streaming continuously (using "s3-upload-stream" and "busboy") into S3