1
votes

I imported a table from sql using sqoop import using the command sqoop import. On doing a select count(*) from Hive, I'm getting the row count as

231743

But the actual SQL table has 231742 rows.

Why am I getting one row extra for this table?

I imported 2 other similar tables have large amounts of data and am getting the exact count. But this particular table gives me an extra row in hive. Why is that? :-o

PS: I included --hive-drop-import-delims with the sqoop import command

Thanks in advance :)

UPDATE: Seems like I have duplicate entries in the table. It got generated during the import. Anyone has any idea why? :)

1
Try doing the same process with a small amount of rows, and then it will be easier to find the problem if it occurs again - eyossi
There seems to be a duplicate row in hive after the import. I tried the --split-by option too. Any idea how to fix this? - Amit
Are you sure you don't have the duplicated row in the table that you import from (the sql table)? - eyossi
Anyway, create a table that is the same as the problematic sql table but with the a lot less rows, and see if you get a duplicate row in hive - that will help you find the problem if you will find the data that is duplicated - eyossi
I'm sure that the sql table doesn't have the duplicated row.. - Amit

1 Answers

0
votes

Okay.. I've solved it.

In the sqoop import command, instead of using --table table-name, I used --query SELECT * FROM table-name WHERE $CONDITIONS. That fixed it.

Thanks for your comments.