1
votes

To be clear, I'm not asking about setting permissions in HDFS, but rather in ext3 or whatever filesystem is being used on the individual datanode machines that HDFS is running on top of.

I know that we set sudo chown hduser:hadoop /app/hadoop/tmp , so user hduser is the file owner, but I'd like to know guidelines for the permissions bits (chmod) on these files.

2

2 Answers

2
votes

If you set the permission to 755 (worse 777), the files in the underlying filesystem can be read by anyone and surely it's a security issue. A restrictive permission configuration such as 700 makes some sense. This prevents an unauthorized user from simply opening and reading the files from local disk rather that using HDFS API.

In a securely configured cluster as of Hadopo version 0.22, 0.23 fix, the permissions on datanode data directories (configured by dfs.datanode.data.dir.perm) now default to 0700. Upon startup, the datanode will automatically change the permissions to match the configured value.

In 1.0 the datanode checks for these values to be the same and refuses to start if they are different. You might get exceptions such as the following if the permissions already provided to where the data is stored violates the default permission configured for Hadoop.

WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /disk1/datanode, expected: rwxr-xr-x, while actual: rwxrwxr-x

I'm quite not sure about what's happening in other versions though. You might want to have a look yourself though.

1
votes

i don't know if I have correctly understood your question but here's some informations :

Setting permissions on local filesystem

Setting permissions is clearly realevant and needed. Indeed, this permissions can enforce your cluster security by preventing non user or even non owner from modify your data. Even if you will probably use lower security to set up your cluster, after its intallation you can modify them to have a safe use.

By the way setting your permission to 777 is almost never a good solution, even if it does not imply direct troubles.
[EDIT] : The good behavior is trying to set as less rights as possible. So while Hadoop works, try to set lower access right (best will be 720). But I can not guarantee that hadoop works with other permissions than 755 as it is the default values.

almost off-topic


for CHMOD bits
values are setted by : Owner User Others each of these values have 3 capabilities : Read Write Execute (in this order) Since theses capabilities can be setted to true(1) or false(0) it result on an octal value given by the binary values.

For instance
for Owner you want all right, so rwx => 111 = 4 + 2 + 1 = 7
for User only read and execute, r-x => 101 = 4+0+1 = 5
and same for others, r-x => 101 = 4+0+1 = 5

So you have to do a chmod 755 file

for hdfs rights on host filesystem

Hadoop is very sensitive to files and directories access rights on its file system. So if you have not setted them correctly it can throw exception and even prevent namenode or datanodes from starting.
As far as i know some parts of your files have to be own by hadoop:hadoop or hdfs:hadoop and some others by mapred:hadoop (according to your hdfs and mapred users and groups).