If you want to do this programmatically, you can use FileSystem and FileStatus objects from Hadoop to:
- list the contents of your (current or another) target directory,
- check if each of the records of this directory is either a file or another directory, and
- write the name of each file as a new line to a file stored locally.
The code for this type of application can look like this:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.File;
import java.io.PrintWriter;
public class Dir_ls
{
public static void main(String[] args) throws Exception
{
// get input directory as a command-line argument
Path inputDir = new Path(args[0]);
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
if(fs.exists(inputDir))
{
// list directory's contents
FileStatus[] fileList = fs.listStatus(inputDir);
// create file and its writer
PrintWriter pw = new PrintWriter(new File("output.txt"));
// scan each record of the contents of the input directory
for(FileStatus file : fileList)
{
if(!file.isDirectory()) // only take into account files
{
System.out.println(file.getPath().getName());
pw.write(file.getPath().getName() + "\n");
}
}
pw.close();
}
else
System.out.println("Directory named \"" + args[0] + "\" doesn't exist.");
}
}
So if we want to list the files from the root (.) directory of HDFS, and we have these as the contents under it (notice how we both have directories and text files):

This will be the command line output of the application:

And this will be what's written inside the output.txt text file stored locally:

hdfs dfs -ls -C hdfs/path/you/want/files/from > file_list.out- mazaneicha