0
votes

I'm trying to import a huge table from oracle 10g to HDFS (GCS since i'm using sqoop with Google Cloud Dataproc) as AVRO. Everything works fine when the table doesnt have any date columns, but when it does some dates are imported very wrong.

Like: Oracle data -> 30/07/76 and HDFS data -> 14976-07-30 20:02:00.0
Like: Oracle data -> 26/03/84 and HDFS data -> 10384-03-26 20:32:34.0

I'm already mapping the date fields as String to bring them like that. I was importing using the default sqoop way that is bringing the date fields as epoch ints but the conversion was incorrect too.

Like: Oracle data -> 01/01/01 and HDFS data -> -62135769600000 when it should be 978314400000

Please, hope someone help me to fix this issue. Thanks

Aditional information:

Sqoop command that i'm running

import -Dmapreduce.job.user.classpath.first=true -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect=$JDBC_STR --username=$USER --password=$PASS --target-dir=gs://sqoop-dev-out-files/new/$TABLE --num-mappers=10 --fields-terminated-by="\t" --lines-terminated-by="\n" --null-string='null' --null-non-string='null' --table=$SCHEMA.$TABLE --as-avrodatafile --map-column-java="DATACADASTRO=String,DATAINICIAL=String,DATAFINAL=String"

Sqoop version: 1.4.7
JDBC version: 6

1

1 Answers

0
votes

I think your date in oracle is 01/01/0001, try to_char(COLUMN,'DD/MM/YYYY').

My issue is that my date is really 01/01/0001, because of user mistyping, and I can't update the column in the origin oracle database.

My issue is that converting to unix should have come -62135596800000 but instead, it comes -62135769600000(30/12/0000).

At first, I thought that was a timezone issue but it is two days difference.