0
votes

retail_db.categories is having 58 rows

$pig -useHCatalog
grunt> pcategories = LOAD 'retail_db.categories' USING org.apache.hive.hcatalog.pig.HCatLoader();
grunt>b = limit pcategories 100;
grunt>dump b;

Then I am getting all the records But when I am trying to dump original dataset

grunt>dump pcategories;

Then I am getting Error

2018-04-15 16:27:46,444 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:46,723 [main] INFO org.apache.hadoop.hive.metastore.ObjectStore - ObjectStore, initialize called 2018-04-15 16:27:47,170 [main] INFO org.apache.hadoop.hive.metastore.MetaStoreDirectSql - Using direct SQL, underlying DB is MYSQL 2018-04-15 16:27:47,171 [main] INFO org.apache.hadoop.hive.metastore.ObjectStore - Initialized ObjectStore 2018-04-15 16:27:47,171 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,171 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,184 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=retail_db tbl=categories 2018-04-15 16:27:47,184 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_table : db=retail_db tbl=categories 2018-04-15 16:27:47,219 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,244 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,244 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,247 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=retail_db tbl=departments 2018-04-15 16:27:47,247 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_table : db=retail_db tbl=departments 2018-04-15 16:27:47,261 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,284 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,284 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,286 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=retail_db tbl=categories 2018-04-15 16:27:47,286 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_table : db=retail_db tbl=categories 2018-04-15 16:27:47,386 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,388 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2018-04-15 16:27:47,397 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,397 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2018-04-15 16:27:47,397 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2018-04-15 16:27:47,398 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2018-04-15 16:27:47,399 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2018-04-15 16:27:47,399 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2018-04-15 16:27:47,406 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,407 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2018-04-15 16:27:47,409 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2018-04-15 16:27:47,409 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2018-04-15 16:27:47,435 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,435 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,437 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=retail_db tbl=categories 2018-04-15 16:27:47,437 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_table : db=retail_db tbl=categories 2018-04-15 16:27:47,458 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2018-04-15 16:27:48,419 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/hive-metastore-2.3.2.jar to DistributedCache through /tmp/temp-1113251818/tmp122824794/hive-metastore-2.3.2.jar 2018-04-15 16:27:48,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/libthrift-0.9.3.jar to DistributedCache through /tmp/temp-1113251818/tmp1608619006/libthrift-0.9.3.jar 2018-04-15 16:27:49,708 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/hive-exec-2.3.2.jar to DistributedCache through /tmp/temp-1113251818/tmp1023486409/hive-exec-2.3.2.jar 2018-04-15 16:27:50,352 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/libfb303-0.9.3.jar to DistributedCache through /tmp/temp-1113251818/tmp-207303388/libfb303-0.9.3.jar 2018-04-15 16:27:51,375 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/jdo-api-3.0.1.jar to DistributedCache through /tmp/temp-1113251818/tmp120570913/jdo-api-3.0.1.jar 2018-04-15 16:27:51,497 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/slf4j-api-1.7.25.jar to DistributedCache through /tmp/temp-1113251818/tmp1251741235/slf4j-api-1.7.25.jar 2018-04-15 16:27:51,786 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/hive-hbase-handler-2.3.2.jar to DistributedCache through /tmp/temp-1113251818/tmp1351750668/hive-hbase-handler-2.3.2.jar 2018-04-15 16:27:52,653 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.17.0/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1113251818/tmp1548980484/pig-0.17.0-core-h2.jar 2018-04-15 16:27:53,042 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.2.jar to DistributedCache through /tmp/temp-1113251818/tmp-2078279932/hive-hcatalog-pig-adapter-2.3.2.jar 2018-04-15 16:27:53,197 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.17.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1113251818/tmp1231439146/automaton-1.11-8.jar 2018-04-15 16:27:53,875 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/antlr-runtime-3.5.2.jar to DistributedCache through /tmp/temp-1113251818/tmp970518288/antlr-runtime-3.5.2.jar 2018-04-15 16:27:53,900 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2018-04-15 16:27:53,920 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2018-04-15 16:27:53,922 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2018-04-15 16:27:54,152 [JobControl] INFO org.apache.hadoop.mapreduce.JobResourceUploader - Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/jay/.staging/job_1523787662857_0004 2018-04-15 16:27:54,197 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2018-04-15 16:27:54,232 [JobControl] INFO org.apache.hadoop.mapred.FileInputFormat - Total input files to process : 1 2018-04-15 16:27:54,232 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2018-04-15 16:27:54,631 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2018-04-15 16:27:55,247 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1523787662857_0004 2018-04-15 16:27:55,247 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Executing with tokens: [] 2018-04-15 16:27:55,253 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources. 2018-04-15 16:27:55,503 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1523787662857_0004 2018-04-15 16:27:55,733 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://jay-Lenovo-Z50-70:8088/proxy/application_1523787662857_0004/ 2018-04-15 16:27:55,733 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1523787662857_0004 2018-04-15 16:27:55,733 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases pcategories 2018-04-15 16:27:55,733 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: pcategories[3,14] C: R: 2018-04-15 16:27:55,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2018-04-15 16:27:55,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1523787662857_0004] 2018-04-15 16:28:27,422 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2018-04-15 16:28:27,422 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1523787662857_0004 has failed! Stop running all dependent jobs 2018-04-15 16:28:27,422 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2018-04-15 16:28:27,424 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2018-04-15 16:28:27,580 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2018-04-15 16:28:27,827 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2018-04-15 16:28:27,827 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 3.0.0 0.17.0 jay 2018-04-15 16:27:47 2018-04-15 16:28:27 UNKNOWN

Failed!

Failed Jobs: JobId Alias Feature Message Outputs job_1523787662857_0004 pcategories MAP_ONLY Message: Job failed! hdfs://localhost:9000/tmp/temp-1113251818/tmp-83503168,

Input(s): Failed to read data from "retail_db.categories"

Output(s): Failed to produce result in "hdfs://localhost:9000/tmp/temp-1113251818/tmp-83503168"

Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0

Job DAG: job_1523787662857_0004

2018-04-15 16:28:27,828 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2018-04-15 16:28:27,836 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias pcategories Details at logfile: /home/jay/pig_1523787729987.log

AM Container for appattempt_1523799060075_0001_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2018-04-15 19:02:58.344]Exception from container-launch.
Container id: container_1523799060075_0001_02_000001
Exit code: 1
[2018-04-15 19:02:58.348]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2018-04-15 19:02:58.348]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
For more detailed output, check the application tracking page: http://jay-Lenovo-Z50-70:8088/cluster/app/application_1523799060075_0001 Then click on links to logs of each attempt.  this what get after clicking the link
1
Where it says The url to track the job... Your actual errors output should exist there in YARNOneCricketeer

1 Answers

0
votes

It worked fine for me. I ran below commands

$pig -useHCatalog
grunt> pcategories = LOAD 'hive_testing.address' USINGorg.apache.hive.hcatalog.pig.HCatLoader();
grunt>dump pcategories

Here i have created a dummy address table in my database

Output

(101,india,xxx)

So the issue could be with your dataset and not with the commands you are running.