0
votes

I was trying to integrate hcatalog with sqoop in order to import data from rdbms(oracle) to data lake(in hive).

sqoop-import --connect connection-string --username username --password pass --table --hcatalog-database data_extraction --hcatalog-table --hcatalog-storage-stanza 'stored as orcfile' -m1 --verbose

Job got executed e=successfully but not able to find the data. Also, checked the location of the table created in hcatalog, after checking the location found that any directory is not created for that and only a 0 byte file _$folder$ was found.

please found the stack trace :

19/09/25 17:53:37 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
19/09/25 17:54:02 DEBUG db.DBConfiguration: Fetching password from job credentials store
19/09/25 17:54:03 INFO db.DBInputFormat: Using read commited transaction isolation
19/09/25 17:54:03 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '1=1' and upper bound '1=1'
19/09/25 17:54:03 INFO mapreduce.JobSubmitter: number of splits:1
19/09/25 17:54:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1569355854349_1231
19/09/25 17:54:04 INFO impl.YarnClientImpl: Submitted application application_1569355854349_1231
19/09/25 17:54:04 INFO mapreduce.Job: The url to track the job: http://<PII-removed-by-me>/application_1569355854349_1231/
19/09/25 17:54:04 INFO mapreduce.Job: Running job: job_1569355854349_1231
19/09/25 17:57:34 INFO hive.metastore: Closed a connection to metastore, current connections: 1
 19/09/25 18:02:59 INFO mapreduce.Job: Job job_1569355854349_1231 running in uber mode : false
19/09/25 18:02:59 INFO mapreduce.Job:  map 0% reduce 0%
19/09/25 18:03:16 INFO mapreduce.Job:  map 100% reduce 0%
19/09/25 18:03:18 INFO mapreduce.Job: Job job_1569355854349_1231 completed successfully
19/09/25 18:03:18 INFO mapreduce.Job: Counters: 35
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=425637
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=1
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
                S3: Number of bytes read=0
                S3: Number of bytes written=310154
                S3: Number of read operations=0
                S3: Number of large read operations=0
                S3: Number of write operations=0
        Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=29274
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=14637
                Total vcore-milliseconds taken by all map tasks=14637
                Total megabyte-milliseconds taken by all map tasks=52459008
        Map-Reduce Framework
                Map input records=145608
                Map output records=145608
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=199
                CPU time spent (ms)=4390
                Physical memory (bytes) snapshot=681046016
                Virtual memory (bytes) snapshot=5230788608
                Total committed heap usage (bytes)=1483210752
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0
19/09/25 18:03:18 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 582.8069 seconds (0 bytes/sec)
19/09/25 18:03:18 INFO mapreduce.ImportJobBase: Retrieved 145608 records.
19/09/25 18:03:18 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table null
19/09/25 18:03:19 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@1d548a08
1
we have hive warehouse over s3.Himanshu Aggarwal

1 Answers

0
votes

Solved it. As we are using AWS EMR(managed hadoop service).It is already mentioned on their site. Aws Forum Screenshot

When you use Sqoop to write output to an HCatalog table in Amazon S3, disable Amazon EMR direct write by setting the mapred.output.direct.NativeS3FileSystem and mapred.output.direct.EmrFileSystem properties to false. For more information, see Using HCatalog. You can use the Hadoop -D mapred.output.direct.NativeS3FileSystem=false and -D mapred.output.direct.EmrFileSystem=false commands.

If you don't disable direct write, no error occurs, but the table is created in Amazon S3 and no data is written.

can be found at https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-sqoop-considerations.html