0
votes

I'm trying to move my daily apache access log files to a Hive external table by coping the daily log files to the relevant HDFS folder for each month. I try to use wildcard, but it seems that hdfs dfs doesn't support it? (documentation seems to be saying that it should support it).

Copying individual files works:

$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put "/mnt/prod-old/apache/log/access_log-20150102.bz2" /user/myuser/prod/apache_log/2015/01/

But all of the following ones throw "No such file or directory":

$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put "/mnt/prod-old/apache/log/access_log-201501*.bz2" /user/myuser/prod/apache_log/2015/01/
put: `/mnt/prod-old/apache/log/access_log-201501*.bz2': No such file or directory

$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put /mnt/prod-old/apache/log/access_log-201501* /user/myuser/prod/apache_log/2015/01/
put: `/mnt/prod-old/apache/log/access_log-201501*': No such file or directory

The environment is on Hadoop 2.3.0-cdh5.1.3

1

1 Answers

2
votes

I'm going to answer my own question. So hdfs dfs put does work with wildcard, the problem is that the input directory is not a local directory, but a mounted SSHFS (fuse) drive. It seems that SSHFS is the one not able to handle wildcard characters.

Below is the proof the hdfs dfs put works just fine with wildcards when using the local filesystem and not the mounted drive:

$ sudo HADOOP_USER_NAME=myuser hdfs dfs -put /tmp/access_log-201501* /user/myuser/prod/apache_log/2015/01/
put: '/user/myuser/prod/apache_log/2015/01/access_log-20150101.bz2': File exists
put: '/user/myuser/prod/apache_log/2015/01/access_log-20150102.bz2': File exists