Hadoop Restore from namenode and datanode files

Question

I have the datanode, namenode and secondary namenode folder (with all data or information inside) from a a different hadoop installation.

My question is, how can you see whats in there or add it to your local HDFS to see the data or information.

There can be a way to restore it or something, but i cant find any information about it.

The folder tree is like this:

For Namenode & SecondaryNamenode:

data/dfs/name
├── current
│ ├── VERSION
│ ├── edits_0000000000000000001-0000000000000000007
│ ├── edits_0000000000000000008-0000000000000000015
│ ├── edits_0000000000000000016-0000000000000000022
│ ├── edits_0000000000000000023-0000000000000000029
│ ├── edits_0000000000000000030-0000000000000000030
│ ├── edits_0000000000000000031-0000000000000000031
│ ├── edits_inprogress_0000000000000000032
│ ├── fsimage_0000000000000000030
│ ├── fsimage_0000000000000000030.md5
│ ├── fsimage_0000000000000000031
│ ├── fsimage_0000000000000000031.md5
│ └── seen_txid

And for Datanode:

data/dfs/data/
├── current
│ ├── BP-1079595417-192.168.2.45-1412613236271
│ │ ├── current
│ │ │ ├── VERSION
│ │ │ ├── finalized
│ │ │ │ └── subdir0
│ │ │ │ └── subdir1
│ │ │ │ ├── blk_1073741825
│ │ │ │ └── blk_1073741825_1001.meta
│ │ │ │── lazyPersist
│ │ │ └── rbw
│ │ ├── dncp_block_verification.log.curr
│ │ ├── dncp_block_verification.log.prev
│ │ └── tmp
│ └── VERSION

Thanks in advance.

Chris Nauroth Chris Nauroth · Accepted Answer · 2017-01-25T07:45:34

The standard solution for copying data between different Hadoop clusters is to run the DistCp command to execute a distributed copy of the desired files from source to destination.

Assuming that the other cluster is no longer running, and you only have these backup files, then it's possible to restore by copying the files that you have into the directories used by the new Hadoop cluster. These locations will be specified in configuration properties in hdfs-site.xml: dfs.namenode.name.dir for the NameNode (your data/dfs/name directory) and dfs.datanode.data.dir for the DataNode (your data/dfs/data directory).

Please note that this likely will only work if you run the same version of Hadoop from the prior deployment. Otherwise, there could be a compatibility problem. If you attempt to run an older version, then the NameNode will fail to start. If you attempt to run a newer version, then you may need to go through an upgrade process first by running hdfs namenode -upgrade.

One other option if you just need to look at the file system metadata is to use the Offline Image Viewer and Offline Edits Viewer commands. These commands can decode and browse the fsimage and edits files respectively.

Hadoop Restore from namenode and datanode files

1 Answers