4
votes

Goal: Use a script to run through 5 million - 10 million XML files and evaluate their date, if older than 90 days delete the file. The script would be run daily.

Problem: Using powershell Get-ChildItem -recurse, causes the script to lock up and fail to delete any files, I assume this is because of the way Get-ChildItem needs to build the whole array before taking any action on any file.

Solution ?: After lots of research I found that [System.IO.Directory]::EnumerateFiles will be able to take action on items in the array before the array is completely built so that should make things more efficient (https://msdn.microsoft.com/library/dd383458%28v=vs.100%29.aspx). After more testing I found that foreach ($1 in $2) is more efficient than $1 | % {} Before I run this new code and potentially crash this server again is there any adjustment anyone can suggest for a more efficient way to script this?

For testing I just created 15,000 x 0.02KB txt files in 15,000 directories with random data in them and ran the below code, I used 90 seconds instead of 90 days on the $date variable just for the test, it took 6 seconds to delete all the txt files.

$getfiles = [System.IO.Directory]::EnumerateFiles("C:\temp", "*.txt", "AllDirectories")
$date = ([System.DateTime]::Now).AddSeconds(-90)
foreach ($2 in $getfiles) {
if ([System.IO.File]::GetLastWriteTime($2) -le $date) {
[System.IO.File]::Delete($2)
} #if
} #foreach
3
If you haven't done so already, it might help to disable 8.3 filename generation. Read all warnings first. Make a backup or two. Use at your own risk.Andrew Morton
As long as you simply save the output from enumeratefiles to a variable, you're not getting any benefits from the ienumerable as PS will wait for the line to finish before continuing (it's not an async method). You need to use it directly in a loop, pipeline or something similar.Frode F.

3 Answers

7
votes

Powershell one-liner that does 100,000 files >= 90 days old.

[IO.Directory]::EnumerateFiles("C:\FOLDER_WITH_FILES_TO_DELETE") |
select -first 100000 | where { [IO.File]::GetLastWriteTime($_) -lt
(Get-Date).AddDays(-90) } | foreach { rm $_ }

or with progress shown:

[IO.Directory]::EnumerateFiles("C:\FOLDER_WITH_FILES_TO_DELETE") |
select -first 100000 | where { [IO.File]::GetLastWriteTime($_) -lt
(Get-Date).AddDays(-90) } | foreach { $c = 0 } { Write-Progress
-Activity "Delete Files" -CurrentOperation $_ -PercentComplete 
((++$c/100000)*100); rm $_ }

This works on folders that have a very large number of files. Thanks to my co-worker Doug!

4
votes

You may be able to tweak it a little by filtering the $getfiles array completely before starting to delete files.

In PowerShell 3.0 and newer you can do this without using the pipeline (which indeed does add some overhead), by using the .Where({}) extension method:

$date  = (Get-Date).AddDays(-90)
$files = [System.IO.Directory]::EnumerateFiles("C:\temp", "*.txt", "AllDirectories").Where({[System.IO.File]::GetLastWriteTime($_) -le $date})
foreach($file in $files)
{
    [System.IO.File]::Delete($file)
}

Since you don't seem to care about it anyways, a final minuscule optimization may be had be waiwing error handling completely and just call the Windows API directly:

$Kernel32Util = Add-Type -MemberDefinition @'
[DllImport("kernel32", CharSet = CharSet.Unicode, SetLastError = true)]
[return: MarshalAs(UnmanagedType.Bool)]
public static extern bool DeleteFile(string filePath);
'@ -Name 'Kernel32Util' -Namespace 'NativeCode' -PassThru

And then do the same as above with your new external function wrapper instead of [File]::Delete():

foreach($file in $files)
{
    [void]$Kernel32Util::DeleteFile($file)
}

At this point though, I would probably take a step back and ask the question:

"Am I using the right tool for the job?"

My (personal) answer would be: "Probably not" - time to write a small utility in a compiled language (C#, F#, VB.NET) instead.

PowerShell is super powerful and useful, but at the cost of performance - that's not a bad thing - it's just something worth taking into account when deciding on what tool to use for a specific task :)

1
votes

I ended up with several slightly different codes for different versions of powershell

#If powershell version is >3
$date = ([System.DateTime]::Now).AddDays(-30)
foreach ($2 in ([System.IO.Directory]::EnumerateFiles("D:\Folder to cleanup", "*.*", "AllDirectories").Where({[System.IO.File]::GetLastWriteTime($_) -le $date}))) {
[System.IO.File]::Delete($2)
} #foreach

#IF powershell version is >2.0 <3.0
$date = ([System.DateTime]::Now).AddDays(-30)
foreach ($2 in ([System.IO.Directory]::EnumerateFiles("D:\Folder to cleanup", "*.*", "AllDirectories"))) {
if ([System.IO.File]::GetLastWriteTime($2) -le $date) {
[System.IO.File]::Delete($2)
} #if
} #foreach

#IF powershell version is 2.0
$date = ([System.DateTime]::Now).AddDays(-30)
foreach ($2 in ([System.IO.Directory]::GetFiles("D:\Folder to cleanup", "*.*", "AllDirectories"))) {
if ([System.IO.File]::GetLastWriteTime($2) -le $date) {
[System.IO.File]::Delete($2)
} #if
} #foreach