0
votes

An application we use saves out a configuration string as XML with the tag <connectionStrings>. I'm writing a backup script that needs to know that DB that's pointed to. I've found this string in different files on different machines, depending on how it was installed.

To start with, I tried simply using Win10's Search in Explorer. This fails to find the string no matter what options I try. I tried walking up and down the directory tree, even selecting the folder the text file is in, and it still can't find it. I have all the search options turned on, any ideas?

But my main question is whether there is now a canonical solution for finding-strings-in-files within .Net? I find many examples here, but they generally use external shelled utilities, or just read every file and search. One interesting solution used an external indexer, but that's outside the scope of this project.

1

1 Answers

1
votes

The short answer is No. There is no prescribed way to perform this in C#, because the way you go about it is going to vary depending on your use case. However, there are plenty of options out there for performing this type of operation.

To start, let's consider that if we want to search for contents in a file, at some point we had to open the file and look at its contents. You mentioned in your last paragraph the concept of using an external indexer, which would do exactly this. Funny enough, that is the exact same thing that Windows Search is going to do, so let's start by looking at that.

When you perform a Windows Search it uses the Search Index to lookup files. If you aren't finding the files you are searching for, there are a few possible reasons for this.

  1. Search Indexing has been disabled entirely.
  2. Search Indexing is not running on the folder containing your config files.
  3. Search Indexing is not configured to scan files with your configuration's extension.

Assuming all of these things are configured correctly, you should see results when performing a search. However, when performing a search for connectionStrings on my machine, I didn't get any of my expected web.config files back. When digging a little deeper, Windows Search is only configured to Index Properties Only for .config files instead of Index Properties and File Contents. There is probably a good security reason for why you shouldn't index these files as well but I will leave that for another post.

Overall though, I think trying to use Windows Search or another library to do this is overkill for such a basic task. I assume the following.

  1. You know the general location (or parent folder) where all of these config files reside. Even if they are nested, you are probably within 2-3 levels of each config file.
  2. You know the extension(s) of the config file you are searching for and can add that to a white list.

Assuming you know these two things, the efficiency of finding and searching the files should be fine. You would want to follow a pattern as outlined below.

  1. Select the root folder
  2. List the files in the current folder. Select and files with extension(s) that match your whitelist.
  3. Read the file contents and look for your string. You could do this using buffers but you will need to have a bit more complex logic for edge cases (where your search term overlaps the edge of your buffer). This shouldn't be necessary unless these are much more than just config files.
  4. Any files found to contain your search term you perform your required action on.
  5. Now, scan the current folder for other folders. Recursively process each folder, looping back to Step #2 each time.

To enhance this solution, you could use a temporary cache where you keep track of each matching file you find that needs to be backed up. You could save this cache to a file (along with a timestamp) and only re-scan after a set period of time. This way for subsequent runs of your backup utility, you don't actually have to search the file system you simply backup files.

Another option would be to store a blacklist of all "false positive" files so you don't search the contents of the files in a future run through.

I hope this helps and if you have any questions please let me know.