I got the following error when loading data from Impala to Vertica with Sqoop.
Error: java.io.IOException: Can't export data, please check failed map task logs at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.io.IOException: java.sql.BatchUpdateException: [Vertica]VJDBC One or more rows were rejected by the server. at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:233) at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84) ... 10 more Caused by: java.sql.BatchUpdateException: [Vertica]VJDBC One or more rows were rejected by the server. at com.vertica.jdbc.SStatement.processBatchResults(Unknown Source) at com.vertica.jdbc.SPreparedStatement.executeBatch(Unknown Source) at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:231)
And I was running the following command:
sudo -u impala sqoop export -Dsqoop.export.records.per.statement=xxx --driver com.vertica.jdbc.Driver --connect jdbc:vertica://host:5433/db --username name --password pw --table table --export-dir /some/dir -m 1 --input-fields-terminated-by '\t' --input-lines-terminated-by '\n' --batch
This error was not raised every time. I had several successful tests loading over 2 million rows of data. So I guess there might be some bad data that contains special characters in the rejected rows. This is very annoying because when this error raised, mapreduce job would rollback and retry. In this case, there would be lots of duplicate data in the target table.
Does anyone have idea if there is any sqoop export parameter that can be set to deal with special characters or if there is any way to skip the bad data, which means to disable rollback? Thanks!