2
votes

I am trying to install Sqoop on Amazon EMR cluster, as per the steps described in Kyle Mulka's blog "http://blog.kylemulka.com/2012/04/how-to-install-sqoop-on-amazon-elastic-map-reduce-emr/#comments".

After uploading the required files on the S3 location, I tried to run the following EMR job through CLI.

./elastic-mapreduce --create --name SQOOP-INSTALL --jar s3://<YOUR-REGION>.elasticmapreduce/libs/script-runner/script-runner.jar --arg s3://<YOUR-BUCKET>/sqoop-install/install_sqoop.sh.

I can see an EMR job with name SQOOP-INSTALL is running on the cluster, but after some time the job is getting cancelled automatically. I tried to go through the logs that gets generated during the EMR job running, but there are no error messages. Also, the logs generating are not giving enough information about the job flow.

Requesting you to help me in installing the SQOOP on EMR cluster.

Thanks in Advance.

Avinash

1

1 Answers

1
votes

After doing lot of trial and errors I came to know about some problems that I was facing. Following are the required steps that you need to do if you are running a job in a VPC. 1. Add the subnet in the job creation as follows {./elasticmapreduce --create --submet <YOUR-SUBNETID} This will create a job on the emr cluster. 2. Get the JOB id and then add step to the created job as

./elastic-mapreduce -j <JOBFLOW-ID --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar --arg s3://<YOURBUCKET>/install-sqoop.sh

Hope this will help to those who are facing this kind of problem