1
votes

Knowing that hive uses a metastore along with hdfs, is it feasible to restore a hdfs snapshot, taken from a running hadoop-hive cluster, to a new hadoop-hive cluster?

One step that I think would be mandatory is to create the tables again in hive, but would these tables be automatically wired to the snapshot files?

One link on this topic is at Apache Mail Archives. I was hoping if there was a newer or better answer to this.

1

1 Answers

3
votes

Hive works with 2 (metadata + warehousedata in hdfs)

Give a try like this : (haven't checked, please note)

1) Copy hive warehouse data from current Hadoop-hive Cluster to new Hadoop-hive Cluster using dstcp

$ hadoop distcp hftp://old-custer:50070/user/hive/warehouse hdfs://new-Cluster/user/hive/warehouse

2) Assuming your metadata is stored in MySQL (not in default derby), Point your new hive to old-metadata mysql server (in hive-site.xml of new cluster). By this you no need to create the schemas/tables again.