Quickest way in C# to find a file in a directory with over 20,000 files


Question

I have a job that runs every night to pull xml files from a directory that has over 20,000 subfolders under the root. Here is what the structure looks like:

rootFolder/someFolder/someSubFolder/xml/myFile.xml
rootFolder/someFolder/someSubFolder1/xml/myFile1.xml
rootFolder/someFolder/someSubFolderN/xml/myFile2.xml
rootFolder/someFolder1
rootFolder/someFolderN

So looking at the above, the structure is always the same - a root folder, then two subfolders, then an xml directory, and then the xml file. Only the name of the rootFolder and the xml directory are known to me.

The code below traverses through all the directories and is extremely slow. Any recommendations on how I can optimize the search especially if the directory structure is known?

string[] files = Directory.GetFiles(@"\\somenetworkpath\rootFolder", "*.xml", SearchOption.AllDirectories);
1
18
4/3/2009 2:21:21 PM

Accepted Answer

Rather than doing GetFiles and doing a brute force search you could most likely use GetDirectories, first to get a list of the "First sub folder", loop through those directories, then repeat the process for the sub folder, looping through them, lastly look for the xml folder, and finally searching for .xml files.

Now, as for performance the speed of this will vary, but searching for directories first, THEN getting to files should help a lot!

Update

Ok, I did a quick bit of testing and you can actually optimize it much further than I thought.

The following code snippet will search a directory structure and find ALL "xml" folders inside the entire directory tree.

string startPath = @"C:\Testing\Testing\bin\Debug";
string[] oDirectories = Directory.GetDirectories(startPath, "xml", SearchOption.AllDirectories);
Console.WriteLine(oDirectories.Length.ToString());
foreach (string oCurrent in oDirectories)
    Console.WriteLine(oCurrent);
Console.ReadLine();

If you drop that into a test console app you will see it output the results.

Now, once you have this, just look in each of the found directories for you .xml files.

16
4/3/2009 2:36:09 PM

I created a recursive method GetFolders using a Parallel.ForEach to find all the folders named as the variable yourKeyword

List<string> returnFolders = new List<string>();
object locker = new object();

Parallel.ForEach(subFolders, subFolder =>
{
    if (subFolder.ToUpper().EndsWith(yourKeyword))
    {
        lock (locker)
        {
            returnFolders.Add(subFolder);
        }
    }
    else
    {
        lock (locker)
        {
            returnFolders.AddRange(GetFolders(Directory.GetDirectories(subFolder)));
        }
    }
});

return returnFolders;

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon