To state my problem
1) I want to backup our cdh hadoop cluster to s3
2) We have an emr cluster running
3) I am trying to run s3distcp from emr cluster giving src as hdfs URL of the cdh remote cluster and destination as s3 .
Having following error : Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=mapred, access=RE AD_EXECUTE, inode="/tmp/hadoop-mapred/mapred/staging"
Following are my questions after going through documentation here
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html
1)Is this doable . I can see from the s3distcp documentation that any hdfs url can be given . But I cant find any documentation as to how it would work in case of external cluster .
2) I would like to know where the staging directory(It was mentioned that s3distcp copies data to this directory before copying to s3) , which is mentioned in the documentation is created i.e, in remote cluster or the emr cluster .