I am trying to read files from a directory which contains many sub directories. The data is in S3 and I am trying to do this:
val rdd =sc.newAPIHadoopFile(data_loc,
classOf[org.apache.hadoop.mapreduce.lib.input.TextInputFormat],
classOf[org.apache.hadoop.mapreduce.lib.input.TextInputFormat],
classOf[org.apache.hadoop.io.NullWritable])
this does not seem to work.
Appreciate the help
textFile("s3n://<root_dir>/*")
? - Soumya Simantas3n://bucket/*/*/*
. - Nick Chammass3n://bucket/root_dir/*/*/*
for year, month, date . But does something like this work:s3n://bucket/root_dir/*/data/*/*/*
basically a directory in every sub directory ? - venuktan