Copy files from sub-directories to base directory in hdfs using spark/scala

Question

I have folders created under a base hdfs directory everytime a job runs. And under each folder there are .dat files.

I need to copy the .dat files to my base directory using scala and archive the sub-directories

For example. Base directory:- /user/srav/ Sub-directories:- /user/srav/20190101 /user/srav/20180101

I have .dat files in my sub-directories /user/srav/20190101/test1.dat, /user/srav/20180101/test2.dat I need to copy them under /user/srav/ and archive the 20190101, 20180101 folders. Please suggest on how we could implement this using spark/scala (spark ver 2.0)

chlebek chlebek · Accepted Answer · 2019-10-30T12:15:41

You can try hadoop fs. Something like this:

  import org.apache.hadoop.fs._

  val conf2 = spark.sparkContext.hadoopConfiguration
  val fs = FileSystem.get(conf2)

  val srcs = Array("/user/srav/20190101","/user/srav/20180101").map(new Path(_))
  val dst = new Path("/user/srav/")

  fs.moveFromLocalFile(srcs,dst)

Copy files from sub-directories to base directory in hdfs using spark/scala

1 Answers