2
votes

I have a folder full of zip files. Those zip files sometimes contain zip files, that sometimes contain zip files within them, and so on. I am trying to write a batch file that I can paste into the top folder containing all the zips, and when it runs it will unzip all the nested zip files, and within sub-directories, all the way down, and delete the zips once they have been successfully extracted. The full file paths need to be preserved. If there is an error and a file cannot be extracted then it should not be deleted and the file and file path need to be printed to a text file.

So far I have this:

@ECHO ON

SET source=%cd%
FOR /F "TOKENS=*" %%F IN ('DIR /S /B "%source%\*.zip"') DO "C:\Program Files\7-Zip\7z.exe" x "%%~fF" -o"%%~pF\"
EXIT

Which I can drop into a folder and run, it will unzip the first level of zips but none of the nested zips inside. That's the first hurdle.

The next hurdle would be to delete the successfully extracted zips. And last, not to delete any zips that could not be extracted and print their name and/or path to a text file.

Any suggestions or chunks of code are appreciated. Or if there's a better way to do this entirely.

**** UPDATED ****

Mofi posted an answer that looks like it's working except for one piece:

When a ZIP is extracted, it needs to be extracted to a folder with the same name, so I can still follow the structure.

Starting Example:

[Top Level Folder Holding Zips] (folder)
--ExampleZip.zip
---FileInZip.txt
---FileinZip2.txt
--ExampleZip2.zip
---Folder1 (folder)
----ExampleZip3.zip
-----FileinZip3.txt
-----FileinZip4.txt
---ExampleZip4.zip
----FileinZip5.txt
----FileinZip6.txt

Needs to become this:

[Top Level Folder Holding Zips] (folder)
--ExampleZip (folder)
---FileInZip.txt
---FileinZip2.txt
--ExampleZip2 (folder)
---Folder1 (folder)
----ExampleZip3 (folder)
-----FileinZip3.txt
-----FileinZip4.txt
---ExampleZip4 (folder)
----FileinZip5.txt
----FileinZip6.txt

So the full structure is still visible.

I think the top answer in this question shows what I need to include: Extract zip contents into directory with same name as zip file, retain directory structure

This part:


SET "filename=%~1"
SET dirName=%filename:~0,-4%

7z x -o"%dirName%" "%filename%"

Needs to be smashed in there somewhere. Or it seems like there should be a switch for 7Zip that does it, since you can do this from the context menu with "Extract to *" I thought that's what the "extract with full paths" command does but that must have something to do with the -o switch, specifying output path? How do I specify the output path to be a folder with the same name as the input zip? Or merge the answer from that question I linked with Mofi's answer?

*** UPDATED AGAIN ***

I thought there was an issue with the batch file ignoring ZIP files with underscores in the name, but that was a coincidence and it was actually ignoring ZIP files without the Archive file attribute set.

Mofi suggested another fix for that which worked, but the batch file is not extracting nested zips that needed the Archive file attribute set.

This does kind of work, in that I can manually execute the batch file a few times and it will work it's way through everything in the folder, but the loop calculation does not seem to work, or is calculating/terminating before the batch file sets the Archive attribute for all zip files?

Here is the current version I'm working with:

@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "ErrorOutput="
set "LoopCount=20"

rem The current directory is used on batch file being called without
rem a base folder path or with just one or more double quotes.
set "BaseFolder=%~1"
if defined BaseFolder set "BaseFolder=%BaseFolder:"=%"
if not defined BaseFolder set "BaseFolder=%CD%" & goto VerifyFolderPath

rem Make sure the folder path contains backslashes and not forward slashes
rem and does not contain wildcard characters or redirection operators or a
rem horizontal tab character after removing all double quotes.
set "BaseFolder=%BaseFolder:/=\%"
for /F "delims=*?|<>    " %%I in ("%BaseFolder%") do if not "%BaseFolder%" == "%%I" (
    echo ERROR: %~nx0 must be called with a valid folder path.
    echo        "%~1" is not a valid folder path.
    set "ErrorOutput=1"
    goto EndBatch
)

rem Get full folder path in case of the folder was specified with
rem a relative path. If the folder path references the root of a
rem drive like on using "C:\" or just "\", redefine the folder
rem path with full path for root of the (current) drive.
for %%I in ("%BaseFolder%") do set "BaseFolder=%%~fI"

:VerifyFolderPath
rem The base folder path must end with a backslash for verification.
if not "%BaseFolder:~-1%" == "\" set "BaseFolder=%BaseFolder%\"

rem Verify the existence of the folder. The code above processed also
rem folder paths of folders not existing at all and also invalid folder
rem paths containing for example a colon not (only) after drive letter.
if not exist "%BaseFolder%" (
    echo ERROR: Folder "%BaseFolder%" does not exist.
    set "ErrorOutput=1"
    goto EndBatch
)

rem Make sure to process all ZIP files existing in base folder and all
rem its subfolders by setting archive file attribute on all ZIP files.
%SystemRoot%\System32\attrib.exe +A /S "%BaseFolder%*.zip"

rem Process all *.zip files found in base folder and all its subfolders
rem which have the archive file attribute set. *.zip files with archive
rem file attribute not set are ignored to avoid an endless running loop
rem if a ZIP archive file cannot be extracted successfully with reason(s)
rem output by 7-Zip or if the ZIP file cannot be deleted after successful
rem extraction of the archive. The archive extraction loop runs are limited
rem additionally by a loop counter as defined at top of the batch file for
rem 100% safety on prevention of an endless loop execution.

:ExtractArchives
set "ArchiveProcessed="
for /F "delims=" %%I in ('dir "%BaseFolder%*.zip" /AA-D /B /S 2^>nul') do (
    set "ArchiveProcessed=1"
    echo Extracting archive: "%%I"
    "%ProgramFiles%\7-Zip\7z.exe" x -bd -bso0 -o"%%~dpnI\" -spd -y -- "%%I"
@pause
    if errorlevel 255 set "ErrorOutput=1" & goto EndBatch
    if errorlevel 1 (
        set "ErrorOutput=1"
        %SystemRoot%\System32\attrib.exe -A "%%I"
    ) else (
        del /A /F "%%I"
        if exist "%%I" (
            echo ERROR: Failed to delete: "%%I"
            set "ErrorOutput=1"
            %SystemRoot%\System32\attrib.exe -A "%%I"
        )
    )
)
if not defined ArchiveProcessed goto EndBatch
set /A LoopCount-=1
if not LoopCount == 0 goto ExtractArchives

:EndBatch
if defined ErrorOutput echo/& pause
endlocal
echo[
echo[
echo If no errors are displayed above, everything extracted successfully. Remember to delete the batch file once you are done.
@pause

It is rare that there would be maybe 10 or 20 layers of nested zips, so a quick and dirty fix may be just somehow looping the whole batch file 10 or 20 times, unless that is a bad idea or there is a more elegant way to do it.

3

3 Answers

2
votes

The task to recursively extract all ZIP archives including nested ZIP archives inside a ZIP archive can be achieved by running the ZIP archive file extraction process in a loop until no ZIP file exists anymore. But there must be at least two use cases taken into account to avoid an endless running archive extraction loop:

  1. The extraction of a ZIP archive file fails for whatever reason. 7-Zip outputs information about the error reason(s). Such a ZIP file should not be processed a second time.
  2. The deletion of a successfully extracted ZIP file fails for whatever reason. The ZIP file should not be processed once again.

The solution is processing only ZIP files with archive file attribute set as done automatically by Windows on creating, renaming or modifying a file and remove the archive file attribute on every ZIP file on which the extraction process or the deletion of the file failed to avoid processing the ZIP file again.

The archive file attribute is set on all *.zip files on directory tree to process before starting the archive files extraction process to make sure that really all existing *.zip files are processed at least once. The archive file attribute is also set on all *.zip files in output directory of a completely successfully processed ZIP archive file to make sure that even *.zip files inside a ZIP file with archive file attribute not set after extraction are processed also on next archive file extraction loop run.

@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "ErrorOutput="
set "LoopCount=20"

rem The current directory is used on batch file being called without
rem a base folder path or with just one or more double quotes.
set "BaseFolder=%~1"
if defined BaseFolder set "BaseFolder=%BaseFolder:"=%"
if not defined BaseFolder set "BaseFolder=%CD%" & goto VerifyFolderPath

rem Make sure the folder path contains backslashes and not forward slashes
rem and does not contain wildcard characters or redirection operators or a
rem horizontal tab character after removing all double quotes.
set "BaseFolder=%BaseFolder:/=\%"
for /F "delims=*?|<>    " %%I in ("%BaseFolder%") do if not "%BaseFolder%" == "%%I" (
    echo ERROR: %~nx0 must be called with a valid folder path.
    echo        "%~1" is not a valid folder path.
    set "ErrorOutput=1"
    goto EndBatch
)

rem Get full folder path in case of the folder was specified with
rem a relative path. If the folder path references the root of a
rem drive like on using "C:\" or just "\", redefine the folder
rem path with full path for root of the (current) drive.
for %%I in ("%BaseFolder%") do set "BaseFolder=%%~fI"

:VerifyFolderPath
rem The base folder path must end with a backslash for verification.
if not "%BaseFolder:~-1%" == "\" set "BaseFolder=%BaseFolder%\"

rem Verify the existence of the folder. The code above processed also
rem folder paths of folders not existing at all and also invalid folder
rem paths containing for example a colon not (only) after drive letter.
if not exist "%BaseFolder%" (
    echo ERROR: Folder "%BaseFolder%" does not exist.
    set "ErrorOutput=1"
    goto EndBatch
)

rem Make sure to process all ZIP files existing in base folder and all
rem its subfolders by setting archive file attribute on all ZIP files.
%SystemRoot%\System32\attrib.exe +A /S "%BaseFolder%*.zip" >nul

rem Process all *.zip files found in base folder and all its subfolders
rem which have the archive file attribute set. *.zip files with archive
rem file attribute not set are ignored to avoid an endless running loop
rem if a ZIP archive file cannot be extracted successfully with reason(s)
rem output by 7-Zip or if the ZIP file cannot be deleted after successful
rem extraction of the archive. The archive extraction loop runs are limited
rem additionally by a loop counter as defined at top of the batch file for
rem 100% safety on prevention of an endless loop execution.

:ExtractArchives
set "ArchiveProcessed="
for /F "delims=" %%I in ('dir "%BaseFolder%*.zip" /AA-D /B /S 2^>nul') do (
    set "ArchiveProcessed=1"
    echo Extracting archive: "%%I"
    "%ProgramFiles%\7-Zip\7z.exe" x -bd -bso0 -o"%%~dpI" -spd -y -- "%%I"
    if errorlevel 255 set "ErrorOutput=1" & goto EndBatch
    if errorlevel 1 (
        set "ErrorOutput=1"
        %SystemRoot%\System32\attrib.exe -A "%%I"
    ) else (
        %SystemRoot%\System32\attrib.exe +A /S "%%~dpnI\*.zip" >nul
        del /A /F "%%I"
        if exist "%%I" (
            echo ERROR: Failed to delete: "%%I"
            set "ErrorOutput=1"
            %SystemRoot%\System32\attrib.exe -A "%%I"
        )
    )
)
if not defined ArchiveProcessed goto EndBatch
set /A LoopCount-=1
if not LoopCount == 0 goto ExtractArchives

:EndBatch
if defined ErrorOutput echo/& pause
endlocal

Note: There must be one horizontal tab character after "delims=*?|<> and " on line 16 of the batch file code and not a series of space characters as there will be after copying the code from browser window and pasting the code into a text editor window.

The batch file is commented with lines with command REM (remark). These comments should be read for understanding the code and then can be removed for a more efficient execution of the batch file by Windows command processor.

The 7-Zip switches used in code are explained by help of 7-Zip opened by double clicking on file 7-zip.chm or opening Help from within GUI window of started 7-Zip. On help tab Contents expand the list item Command Line Version and click on list item Switches to get displayed the help page Command Line Switches with all switches supported by currently used version of 7-Zip.

The batch file can be executed with a folder path as argument to process all ZIP files in this folder and all its subfolders. So it is possible to add to Send to context menu of Windows File Explorer a shortcut file which runs the batch file with the folder path passed by Windows File Explorer to the batch file as first argument. It would be also possible to registry the batch file as context menu option for Directory in Windows registry to be able to run the batch file easily from within any application supporting the Windows context menu handlers for a directory.

Edit after question edited: The command line running 7-Zip can be modified to:

"%ProgramFiles%\7-Zip\7z.exe" x -bd -bso0 -o"%%~dpnI\" -spe -spd -y -- "%%I"

Each ZIP file is extracted with this command line into a subfolder in folder of the ZIP file with name of the ZIP file because of replacing -o"%%~dpI" by -o"%%~dpnI\". The additional 7-Zip switch -spe avoids duplicating the folder name if the ZIP file contains at top level a folder with same name as the ZIP file. So if Example3.zip contains at top level the folder Example3, the files are extracted to folder Example3 and not to folder Example3\Example3 as it would occur without usage of option -spe.

For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.

  • attrib /?
  • call /?
  • dir /?
  • echo /?
  • endlocal /?
  • for /?
  • goto /?
  • if /?
  • rem /?
  • set /?
  • setlocal /?

Read the Microsoft documentation about Using command redirection operators for an explanation of 2>nul. The redirection operator > must be escaped with caret character ^ on FOR command line to be interpreted as literal character when Windows command interpreter processes this command line before executing command FOR which executes the embedded dir command line in a separate command process started in background.

0
votes

Using Groovy, or Ant

This would be a lot easier using Apache Ant or, better still, the Groovy AntBuilder.

e.g. this Groovy script will unzip all the top leval zip files then delete them:

new AntBuilder().with {

  def sourceRoot = '.'

  // Unzip all .zip files in / underneath sourceRoot
  unzip( dest: 'some-folder' ) {
    fileset( dir: sourceRoot ) {
      include name: "**/*.zip"
    }
  }

  // Unzip throws an exception on failure.
  // Delete all .zip files in / underneath sourceRoot
  delete {
    fileset( dir: sourceRoot, includes: '**/*.zip' )
  }
}

You'll need to keep scanning the destination folder for zips, and repeating the above process, until everythings unzipped. You may also find it useful to use a FileScanner.

AntBuilder throws an exception if anything fails, so you can avoid deleting archives that fail to unzip. AntBuilder will also log it's progress, using the standard Java logging mechanisms. You can tell it the level of detail you want, or supress it completely

The full AntBuilder documentation is here:

Using a fileScanner

Example from the Groovy AntBuilder documentation:

// let's create a scanner of filesets
def scanner = ant.fileScanner {
    fileset(dir:"src/test") {
        include(name:"**/My*.groovy")
    }
}

// now let's iterate over
def found = false
for (f in scanner) {
    println("Found file $f")
    found = true
    assert f instanceof File
    assert f.name.endsWith(".groovy")
}
assert found

Putting it together

It's not a huge leap to combine a filesScanner with an AntBuilder to get the job done. I suspect it will be a lot easier than doing it with a batch script.

0
votes

Finally managed to write a batch file that can unzip nested zips, keeping the archive file structure intact!

logic is that, run recursively until all the zip files are unzipped. Number of iterations default is 5, and can be passed as cmd arg "extract.bat 3". may be changed to a while loop until hit file not found exception. And most importantly delete the archive file after extraction, so, we don't get into endless loop! But follow the rules below

  1. it uses 7z, make sure in the cmd window 7z can be run, that is in the path
  2. zip file names cannot have spaces. make sure of that and ext is zip
  3. copy the zip file to a directory where there are no other zip files
  4. And only .zip ext, you may change that to rar or anything in the batch file

Here is the batch file


Rem Nested unzip - @sivakd
echo off
if  "%1"=="" (set iter=5) else (set iter=%1)
echo Running  %iter% iterations
for /l %%x in (1, 1, %iter%) do (
    dir *.zip /s /b > ziplist.txt
    for /F %%f in (ziplist.txt) do (
        7z x %%f -o%%~dpnf -y & del /f %%f
    )
    del ziplist.txt
)