0
votes

I have a windows directory containing tens of thousands of files. I need a list of all the unique file extensions to be put in a variable.

In a java program, I would like to scan through that directory (recursively, including all sub-directories) and retrieve a list of unique file extensions that can be put into a variable.

Ex: Dir contains: File1.txt File2.doc File3.doc File4.doc File5.ppt I would like to retrieve "txt,doc,ppt" and put that into a string variable (They do not have to be separated in an array of any type - although that would work. I only need to end up with a string of them, like the one above).

Is there anyway I can do this? Possibly by accessing the command line or using regex?

3
Please share your code and let us know where you stuck. There are 2 things involved 1) look for all directories and folder 2) regex to match file names.Braj
Have you tried String.endsWith() method.Braj

3 Answers

1
votes

Here is a Java 8 example as String:

        final String extensions = Files.walk(Paths.get(""))
                .map(Path::toString)
                .filter(pathString -> pathString.contains("."))
                .map(pathString -> pathString.substring(pathString.lastIndexOf('.') + 1, pathString.length()))
                .distinct()
                .collect(Collectors.joining(","));

        System.out.println(extensions);

As an array:


     final String[] extensions = Files.walk(Paths.get(""))
                .filter(Files::isRegularFile)
                .map(Path::toString)
                .filter(pathString -> pathString.contains("."))
                .map(pathString -> pathString.substring(pathString.lastIndexOf('.') + 1, pathString.length()))
                .distinct()
                .toArray(String[]::new);

        System.out.println(Arrays.toString(extensions));
0
votes

here is a batch-solution:

del %temp%\x.x 2>nul
for /f "tokens=*" %%i in ('dir /s /b /a-d *') do (find "%%~xi" %temp%\x.x ||<nul set/p .= %~xi>>x.x)
set /p ext=<%temp%\x.x
set ext=%ext:.=,%
set ext=%ext:~1%
echo %ext%
0
votes

For a cmd (batch file) solution

@echo off

    setlocal enableextensions
    for /r "%cd%" %%a in (*) do if not defined "\%%~xa\" (echo(%%~xa&set ""\%%~xa\"=1")
    endlocal

This uses the environment to store the information of seen extensions by setting a variable for each one. If the variable is not set, this is the first time the extension is found and is echoed to console.

edited to adapt to comments and to OP that i have misread. The output needs to be in only one line

@echo off

    setlocal enableextensions
    for /r "%cd%" %%a in (*.*) do if not defined "\%%~xa\" (
        set ""\%%~xa\"=1" 
        if not defined "\" (set ""\"=1" ) else (<nul set /p ".=,")
        <nul set /p ".=%%~xa"
    )
    endlocal

Same working that the previous code, but in this case the output is keept in one line with commas added when needed to separate the elements in the extensions list

edited to properly format the output: remove the dots from extension and store the data in a variable

@echo off

    setlocal enableextensions disabledelayedexpansion 

    for /f "delims=" %%z in ('cmd /e:on /v:off /q /c "for /r "%cd%" %%a in (*.*) do if not defined "\%%~xa\" (set ""\%%~xa\"^=1" & if not defined "\" (set ""\"^=1" ) else (<nul set /p ".^=^,") & <nul set /p ".^=%%~xa" )"') do set "extensionList=%%z"
    set "extensionList=%extensionList:.=%"
    echo(%extensionList%

    endlocal

Still the same code, but to get the data inside a variable, all the previous logic has been moved inside a for command, so the list from previous version can be assigned to a variable. Then the dots are removed from that variable to get the required output.