0
votes

I am a novice of "The Hadoop Ecosystem". My intent is to transfer data from "MySQL" to "HDFs" using Sqoop. I configured correctly (I think) the different tools, but I have a problem in running the "job" in Sqoop.

My Environmen:

  • OS: Kubuntu 15.10 wily 64bit x86_64
  • Hadoop 2.7.1 (stable)
  • MySql 5.6
  • Sqoop 1.99.6 (Sqoop2)

according to these two links Sqoop 5 Minutes & sqoop2-activity-finally, I have created two links in this manner:

sqoop:000> show connector    
+----+------------------------+---------+-------------------------------------------
| Id |          Name          | Version |                        Class              
+----+------------------------+---------+-------------------------------------------
| 1  | generic-jdbc-connector | 1.99.6  | org.apache.sqoop.connector.jdbc.GenericJdb
| 2  | kite-connector         | 1.99.6  | org.apache.sqoop.connector.kite.KiteConnec
| 3  | hdfs-connector         | 1.99.6  | org.apache.sqoop.connector.hdfs.HdfsConnec
| 4  | kafka-connector        | 1.99.6  | org.apache.sqoop.connector.kafka.KafkaConn
+----+------------------------+---------+-------------------------------------------
sqoop:000>  create link -c 1  
Creating link for connector with id 1
Please fill following values to create new link object
Name: mysqlink

Link configuration

JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://localhost:3306/sqooptest
Username: squser
Password: *****
JDBC Connection Properties: 
There are currently 0 values in the map:
entry# protocol=tcp
There are currently 1 values in the map:
protocol = tcp
entry# 
New link was successfully created with validation status OK and persistent id 2
sqoop:000>

sqoop:000>  create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: hdfslink

Link configuration

HDFS URI: hdfs://localhost:9000/
Hadoop conf directory: /usr/local/hadoop/etc/hadoop 
New link was successfully created with validation status OK and persistent id 3
sqoop:000> 

sqoop:000> show link --all
2 link(s) to show: 
link with id 2 and name mysqlink (Enabled: true, Created by hduser at 07/01/16 11.52, Updated by hduser at 07/01/16 11.52)
Using Connector generic-jdbc-connector with id 1
  Link configuration
    JDBC Driver Class: com.mysql.jdbc.Driver
    JDBC Connection String: jdbc:mysql://localhost:3306/sqooptest
    Username: squser
    Password: 
    JDBC Connection Properties: 
      protocol = tcp
link with id 3 and name hdfslink (Enabled: true, Created by hduser at 07/01/16 11.57, Updated by hduser at 07/01/16 11.57)
Using Connector hdfs-connector with id 3
  Link configuration
    HDFS URI: hdfs://localhost:9000/
    Hadoop conf directory: /usr/local/hadoop/etc/hadoop
sqoop:000> show link                 
+----+----------+--------------+------------------------+---------+
| Id |   Name   | Connector Id |     Connector Name     | Enabled |
+----+----------+--------------+------------------------+---------+
| 2  | mysqlink | 1            | generic-jdbc-connector | true    |
| 3  | hdfslink | 3            | hdfs-connector         | true    |
+----+----------+--------------+------------------------+---------+

...and later, I created a job:

sqoop:000> create job --from 2 --to 3
Creating job for links with from id 2 and to id 3
Please fill following values to create new job object
Name: My2Dfs

From database configuration

Schema name: sqooptest
Table name: person
Table SQL statement: 
Table column names: 
Partition column name: id
Null value allowed for the partition column: 
Boundary query: 

Incremental read

Check column: 
Last value: 

To HDFS configuration

Override null value: 
Null value: 
Output format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
Choose: 0
Compression format: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom compression format: 
Output directory: /usr/local/sqoop/prog
Append mode: 

Throttling resources

Extractors: 
Loaders: 
New job was successfully created with validation status OK  and persistent id 1

at this point, when I try to activate the "job", I get the following error (of Tomcat 6).

sqoop:000> start job -j 1 -s
Exception has occurred during processing command 
Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001:Server has returned exception - <html><head><title>Apache Tomcat/6.0.37 - Error report</title><body><h1>HTTP Status 500 - Servlet execution threw an exception</h1><HR size="1" noshade="noshade"><p><b>type</b> Exception report</p><p><b>message</b> <u>Servlet execution threw an exception</u></p><p><b>description</b> <u>The server encountered an internal error that prevented it from fulfilling this request.</u></p><p><b>exception</b> <pre>javax.servlet.ServletException: Servlet execution threw an exception
        org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:595)
        org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
        org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:554)
</pre></p><p><b>root cause</b> <pre>java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/util/Apps
        java.lang.ClassLoader.defineClass1(Native Method)
        java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        [...  
  org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:554)
</pre></p><p><b>note</b> <u>The full stack trace of the root cause is available in the Apache Tomcat/6.0.37 logs.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.37</h3></body></html>

sqoop:000> 

why Sqoop using Tomcat 6? I have already Tomcat 7. I tried adding in "/home/hduser/.bashrc" environment variable CATALINA_HOME, but still have other errors. what is my problem? how can I risorve it?

Edit:

in /usr/local/sqoop/server/logs/catalina.2016-01-09 It is this error:

INFORMAZIONI: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/local/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
gen 09, 2016 3:05:26 AM org.apache.coyote.http11.Http11Protocol init
GRAVE: Error initializing endpoint
java.net.BindException: Indirizzo già in uso <null>:12000
    at org.apache.tomcat.util.net.JIoEndpoint.init(JIoEndpoint.java:549)

I solved this problem by changing the value of the environment variable SQOOP_ADMIN_PORT in sqoop/bin/sqoop-sys.sh and then I put in my "common.loader" the path of the other libraries hadoop. However, I have this error in sqoop/server/logs/catalina.out when I start sqoop2-server

log4j:ERROR A "org.apache.log4j.ConsoleAppender" object is not assignable to a "org.apache.log4j.Appender" variable.
log4j:ERROR The class "org.apache.log4j.Appender" was loaded by 
log4j:ERROR [org.apache.catalina.loader.StandardClassLoader@4605a23b] whereas object of type 
log4j:ERROR "org.apache.log4j.ConsoleAppender" was loaded by [WebappClassLoader
  context: /sqoop
  delegate: false
  repositories:
    /WEB-INF/classes/
----------> Parent Classloader:
org.apache.catalina.loader.StandardClassLoader@4605a23b
].
log4j:ERROR Could not instantiate appender named "stdout".
1
Always look at the underlying cause: <pre>java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/util/Apps It seems that you miss a jar into the projectJorge Campos
Now you have a conflict between two (or more) implementations of log4j, you will have to check the libraries to leave just the right one.Jorge Campos
@JorgeCampos yes, you're right. In fact, I have modified the value of SQOOP_ADMIN_PORT, and I add in the common.loader the other path for hadoop's libraries. But now I have this new error. look at my "Edit"Pierock
@JorgeCampos log4j library is loaded five times in my common.loader respectively in (i) common/lib/*.jar, (ii) tools/lib/*.jar, (iii) yarn/lib/*.jar, (iv) MapReduce/lib/*.jar, and in (v) hdfs/lib/*.jar. I must enter the paths of each individual library, not "* .jar"?Pierock
No you should delegate that responsability to the container. I'm not an hadoop "knower" I just saw the problem right on the exception, unfortunatelly I can't provide a usefull help other than guesses. if it is being loaded five times I would suggest that you chose one and take out the other four. Don't delete, do this as a test, move it to somewhere else.Jorge Campos

1 Answers

0
votes

the Tomcat exception:

Exception has occurred during processing command 
Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001:Server has returned exception - <html><head><title>Apache Tomcat/6.0.37 - Error report</title><body><h1>HTTP Status 500 - Servlet execution threw an exception</h1><HR size="1" noshade="noshade"><p><b>type</b> Exception report</p><p><b>message</b> <u>Servlet execution threw an exception</u></p><p><b>description</b> <u>The server encountered an internal error that prevented it from fulfilling this request.</u></p><p><b>exception</b> <pre>javax.servlet.ServletException: Servlet execution threw an exception
        org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:595)
        org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
        org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:554)
</pre></p><p><b>root cause</b> <pre>java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/util/Apps
        java.lang.ClassLoader.defineClass1(Native Method)
        java.lang.ClassLoader.defineClass(ClassLoader.java:800)

is due to the fact that in the common.loader have not been loaded the yarn's library. Simply add the path /usr/local/hadoop/share/hadoop/yarn/*.jar in the common.loader (/usr/local/sqoop/server/conf/catalina.properties)

The error of Log4j,

log4j:ERROR A "org.apache.log4j.ConsoleAppender" object is not assignable to a "org.apache.log4j.Appender" variable.
log4j:ERROR The class "org.apache.log4j.Appender" was loaded by 
log4j:ERROR [org.apache.catalina.loader.StandardClassLoader@4605a23b] whereas object of type 
log4j:ERROR "org.apache.log4j.ConsoleAppender" was loaded by [WebappClassLoader
  context: /sqoop
  delegate: false
  repositories:
    /WEB-INF/classes/
----------> Parent Classloader:
org.apache.catalina.loader.StandardClassLoader@4605a23b
].
log4j:ERROR Could not instantiate appender named "stdout".

instead, I think it is an issue, because, it is present even if "log4j.jar" is loaded just once in the common.loader. However I think it is a minor problem, in fact, waiting a few seconds, you can see the following message in catalina.out

gen 13, 2016 12:02:17 PM org.apache.catalina.startup.Catalina start
INFORMAZIONI: Server startup in 57151 ms