1
votes

I’m using Sqoop v1.4.2 to do incremental imports with jobs. The jobs are:
--create job_1 -- import --connect <CONNECT_STRING> --username <UNAME> --password <PASSWORD> -m <MAPPER#> --split-by <COLUMN> --target-dir <TARGET_DIR> --table <TABLE> --check-column <COLUMN> --incremental append --last-value 1

NOTES:

  1. Incremental type is append
  2. Job creation is successful
  3. Job execution is successful for repeated times
  4. Can see new rows being imported in HDFS

--create job_2 -- import --connect <CONNECT_STRING> --username <UNAME> --password <PASSWORD> -m <MAPPER#> --split-by <COLUMN> --target-dir <TARGET_DIR> --table <TABLE> --check-column <COLUMN> --incremental lastmodified --last-value 1981-01-01

NOTES:

  1. Incremental type is lastmodified
  2. Job creation is successful, table name is different from as used in job_1
  3. Job execution is successful ONLY FOR FIRST TIME
  4. Can see rows being imported for first execution in HDFS
  5. Subsequent job execution fails with following error:

    ERROR security.UserGroupInformation: PriviledgedActionException as:<MY_UNIX_USER>(auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <TARGET_DIR_AS_SPECIFIED_IN_job_2> already exists
    ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <TARGET_DIR_AS_SPECIFIED_IN_job_2> already exists
        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
        at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:202)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:465)
        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:108)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:403)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
        at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
        at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
        at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
    
1
How did you end up resolving this issue? - Ben Smith
To be frank, I moved to next project so, don't know. Big data / sqoop is not my core competency so, didn't bother much to check. My apologies, I couldn't help. - lupchiazoem
The issue in this question is that your output in both jobs is the same. --target-dir <TARGET_DIR>. This should be different for each job, or each job should delete the current file before running. - Bob
Thanks for the pointer Byron! But, as commented earlier, I'm not Sqooping anymore, so can't verify. But, appreciate your reply! That's the strength of the community and support from awesome samaritans like you :) - lupchiazoem

1 Answers

0
votes

If you wanted to execute job_2 again and again then you need to use --incremental lastmodified --append

sqoop --create job_2 -- import --connect <CONNECT_STRING> --username <UNAME> 
--password <PASSWORD> --table <TABLE> --incremental lastmodified --append 
--check-column<COLUMN> --last-value "2017-11-05 02:43:43" --target-dir 
<TARGET_DIR> -m 1