0
votes

I want to import big table from oracle database to HDFS using Sqoop. As the table size is huge and it is having primary key sqoop can run multiple mappers parallel.

I have some questions in

1)Due to bad record in oracle database, one mapper got exception and others are running fine. So all the job will get failed or except one mapper data all other mappers will write data in HDFS?

2)Is sqoop is intelligent enough to run parallel mappers if we hive --m option. If we give --m 4 then sqoop can increase mappers based on tables size or it will run with 4 only?

Is any body came across this kind of scenario??

1

1 Answers

0
votes

Based on my knowledge.

  1. If one mapper gets failed, The sqoop process will try to kill other mapper. The process won't delete the data from HDFS. You can see some of the data been created in your HDFS location.

  2. When we specify number of mapper (using -m x option) the program will use at most x mapper.