7
votes

We have encountered an unexpected performance issue when traversing directories looking for files using a wildcard pattern.

We have 180 folders each containing 10,000 files. A command line search using dir <pattern> /s completes almost instantly (<0.25 second). However, from our application the same search takes between 3-4 seconds.

We initially tried using System.IO.DirectoryInfo.GetFiles() with SearchOption.AllDirectories and have now tried the Win32 API calls FindFirstFile() and FindNextFile().

Profiling our code using indicates that the vast majority of execution time is spent on these calls.

Our code is based on the following blog post:

http://codebetter.com/blogs/matthew.podwysocki/archive/2008/10/16/functional-net-fighting-friction-in-the-bcl-with-directory-getfiles.aspx

We found this to be slow so updated the GetFiles function to take a string search pattern rather than a predicate.

Can anyone shed any light on what might be wrong with our approach?

3
what are you using to do the search from the command line? Could it be that it is using the Windows search indexes to do the query rather than stepping through every file?Matt Breckon
@Matt we're just doing a dir /s (have updated my post accordingly).Richard Ev
Sounds suspicious. I seriously doublt that "dir" uses anything else except FindFirstFile/FindNextFile as well. Maybe you misuse them. Could you provide a snippet illustrating how you use them?sharptooth
@sharptooth: I have added a link to a post that contains the source code we usedRichard Ev
@Matt dir does not use index serviceSheng Jiang 蒋晟

3 Answers

11
votes

In my tests using FindFirstFileEx with FindExInfoBasic and FIND_FIRST_EX_LARGE_FETCH is much faster than the plain FindFirstFile.

Scanning 20 folders with ~300,000 files took 661 seconds with FindFirstFile and 11 seconds with FindFirstFileEx. Subsequent calls to the same folders took less than a second.

HANDLE h=FindFirstFileEx(search.c_str(), FindExInfoBasic, &data, FindExSearchNameMatch, NULL, FIND_FIRST_EX_LARGE_FETCH); 
3
votes

You can try with an implementation of FindFirstFile and FindNextFile I once blogged about.

0
votes

Try IShellFolder::EnumObjects with SHGetDataFromIDList/IShellFolder::GetAttributesOf.

Pro/Cons here.