1
votes

I am running Sqoop 1.4.7 on AWS EMR 5.21.1 and am trying to import data from a database. I have successfully been able to do this manually where I create an EMR instance with Sqoop installed via the EMR Console.

Here are the preliminary steps that I performed in order to run sqoop on EMR

  1. Download the JDBC Driver
  2. Move the JDBC driver to the /usr/lib/sqoop/lib directory

I was able to successfully run a sqoop import when I was sshd into an EMR cluster with these commands:

wget -O mssql-jdbc.jar https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/8.4.0.jre8/mssql-jdbc-8.4.0.jre8.jar
sudo mv mssql-jdbc.jar /usr/lib/sqoop/lib/

When I try to run these commands from an EMR bootstrap script however I get the error:

usr/lib/sqoop/lib/ No such file or directory

After doing some investigation I realized this is because "Bootstrap actions execute before core services, such as Hadoop or Spark, are installed", as found here

So the /usr/lib/sqoop/lib directory doesnt exist when I run my bootstrap steps.

Here are some solutions which work but they feel like work-arounds

  1. Create the /usr/lib/sqoop/lib directory in my bootstrap script and then place the jar in it
  2. Add the jar to this directory as an EMR step. (Turns out this this is the correct approach, look at below accepted answer)

What is the correct way of installing this JDBC driver on EMR?

1
I tested solution # 1 and it worked successfully. Still not sure if this is the 'best practice' - vi_ral
We are also running script as EMR step to download the jar for Sqoop, Spark, etc. - Snigdhajyoti
m also having same issue can you please send add steps added via bootstrap should ot be run if or normal ? @vi_ral - Shubham Dangare
sorry, no idea what that means - vi_ral

1 Answers

2
votes

The 2nd option is the correct way to do it. The documentation explains running bash scripts as an EMR step.

You can also use the jar command-runner.jar and the arguments to be

bash -c "wget -O mssql-jdbc.jar https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/8.4.0.jre8/mssql-jdbc-8.4.0.jre8.jar;sudo mv mssql-jdbc.jar /usr/lib/sqoop/lib/"