2
votes

I am trying to move data from a local file system to the Hadoop distributed file system , but i am not able to move it through oozie Can we move or copy data from a local filesystem to HDFS using oozie ???

4

4 Answers

6
votes

I found a workaround for this problem. The ssh action will always execute from the Oozie server. So if your files are located on the local file system of the Oozie server, you will be able to copy them to HDFS. The ssh action will always be executed by the 'oozie' user. So your ssh action should look like this: myUser@oozie-server-ip, where myUser is a user with read rights on the files from the Oozie server. Next, you need to set up passwordless ssh between the oozie user and myUser, on the Oozie server. Generate a public key for the 'oozie' user and copy the generated key in the authorized_keys file of 'myUser'. This is the command for generating the rsa key:

ssh-keygen -t rsa

When generating the key, you need to be logged in with the oozie user. Usually on a Hadoop cluster this user will have its home in /var/lib/oozie and the public key will be generated in id_rsa.pub in /var/lib/oozie/.ssh Next copy this key in the authorized_keys file of 'myUser'. You will find it in the user's home, in the .ssh folder. Now that you have set up the passwordless ssh, it time to set up the ssh oozie action. This action will execute the command 'hadoop' and will have as arguments '-copyFromLocal', '${local_file_path}' and '${hdfs_file_path}'.

1
votes

No, Oozie isn't aware of a local filesystem, cause it's run in Map-Reduce cluster nodes. You should use Apache Flume to move data from a local filesystem to HDFS.

0
votes

Oozie will not support the Copy action from Local to HDFS or vise versa, but u can call java program to do the same, Shell action will also work, but if you have more than one node in a cluster, then all the node should be having the said local Mount point available or mounted with read/write access.

-2
votes

You can do this using Oozie shell action by putting the copy command in the shell script.

https://oozie.apache.org/docs/3.3.0/DG_ShellActionExtension.html#Shell_Action

Example:

<workflow-app name="reputation" xmlns="uri:oozie:workflow:0.4">
<start to="shell"/>
<action name="shell">
    <shell xmlns="uri:oozie:shell-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <exec>run.sh</exec>
        <file>run.sh#run.sh</file>
          <capture-output/>
    </shell>
    <ok to="end"/>
    <error to="kill"/>
</action>
<kill name="kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>

In Your run.sh you can use: hadoop fs -copyFromLocal command.