4
votes

I'm new to hadoop.

I have installed my hbase setup using Cloudera (normal installation) on 5 servers. I created some table and filled some data.

Now I monitored the hdfs folder sizes. I can see that the data node is keeping giving consistent reading. But my namenode and journal node disk usage in increasing each time I check.

Though I have a small amount of data, the journal node and name nodes are increasing by 30MB(approx) / day.

Am I missing something in the configurations?

1
You should go dig around the folders these files reside with du -sh and figure out what is taking up the space. It could be logs, image/edits files, or a whole slew of things.Donald Miner
I'd agree with Donald, check the size of the logs generated each dayChris White
logs are going to a seperate folder. I can see that its the active name node server and the journal nodes adding up the space. I did some research and found that whenever there are some edits in the name node, journal takes up and then secondary name node reads it from journal . But here im not doing any operation in my hbase.Bijesh CHandran
When i checked the folders, the edit logs are coming up every 2 mins, that's much expected. But how the Name node makes edit logs, when im not doing any operations on it.Bijesh CHandran
I checked the folders and found that edit logs are created every 2 mins in both name node and journal node folders . now one thing i dnt understand is that if my application is not making any operation , then who is editing the namespace metdata? . I can also see that, there are edit logs from the day i created these instances. Is there any hadoop configuration to clear this logs after some time or do i need to delete it manually ?Bijesh CHandran

1 Answers

1
votes

And after some research i found out the issue, why the edit logs are not cleared. Its a setting in hadoop.

*dfs.namenode.num.extra.edits.retained = 1000000.

This is the default value set . Reference