0
votes

we have small gpdb cluster. when i am trying to read external table using 'gphdfs' Protocol from gpdb master. Environment

Product Version Pivotal Greenplum (GPDB) 4.3.8.2 OS Centos 6.5

Getting Error:

prod=# select * from ext_table;                                                                                      ERROR:  external table gphdfs protocol command ended with error. 16/10/05 14:42:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable  (seg0 slice1 host.domain.com:40000 pid=25491)
DETAIL:

Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://path/to/hdfs
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:285)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
        at com.
Command: 'gphdfs://path/to/hdfs'
External table tableame, file gphdfs://path/to/hdfs

We tried : Following link on Greenplum master machine https://discuss.pivotal.io/hc/en-us/articles/219403388-How-to-eliminate-error-message-WARN-util-NativeCodeLoader-Unable-to-load-native-hadoop-library-for-your-platform-with-gphdfs

Result of Command

It did not work after changing the content in "Hadoop-env.sh" as suggested in Link. Still throwing the same error.Do i need to restart the gpdb for affecting the changes "Hadoop-env.sh".

Or

Is there alternate way to handle gphdfs protocol error ?

Any help on it would be much appreciated ?

Attached is DDL for failing External Table

create external table schemaname.exttablename(
"ID" INTEGER,
time timestamp without time zone,
"SalesOrder" char(6),
"NextDetailLine" decimal(6),
"OrderStatus" char(1),

)
location('gphdfs://hadoopmster.com:8020/devgpdb/filename.txt') FORMAT 'text'
2
Loading Native libs is just a warning, it shouldn't stop things working. The more worrying message in the exception in my view is: Input path does not exist: hdfs://path/to/hdfs - Binary Nerd
Thanks for direction. I am going to check for path. - NEO
I just Observed one symptom : discuss.pivotal.io/hc/en-us/articles/… when i checked the step able to read data from Hadoop /tmp f directory . but I am creating new dir /dev on hadoop . It is complaining for Input path does not exist: hdfs://path/to/hdfs - NEO
Why it is working for Hadoop /tmp dir and not for Newly created dir on hadoop. - NEO

2 Answers

2
votes

Could you please provide your external table DDL that was failing .Also please make sure the gpadmin user has permissions to the hdfs path to read and write the data. Thanks Pratheesh Nair