1
votes

I have the following string, which is the path parameter in sparkContext.textFile method

s3://bucket/prefix1/{file1.txt,file2.special.txt},s3://bucket/prefix2/{file3.txt,file4.special.txt},s3://bucket/prefix3/{file5.special.txt,file6.txt}

I would like to do some parsing and manipulation(fo example get only files which has ".special." in their name), but I prefer not to implement parsing of my own.

Which class does underlying parsing of such URI ?

1

1 Answers

0
votes

Try this:

val path = new org.apache.hadoop.fs.Path("s3://bucket/prefix1/{file1.txt,file2.special.txt},s3://bucket/prefix2/{file3.txt,file4.special.txt},s3://bucket/prefix3/{file5.special.txt,file6.txt}");

https://hadoop.apache.org/docs/r2.8.2/api/org/apache/hadoop/fs/Path.html