1
votes

I have a huge file, and I want to blow away everything in the file except for what matches my regex. I know I can get matches and just extract those, but I want to keep my file and get rid of everything else.

Here's my regex:

"Id":\d+

How do I say "Match everything except "Id":\d+". Something along the lines of

!("Id":\d+) (pseudo regex) ?

I want to use it with a Regex Replace function. In english I want to say:

Get all text that isn't "Id":\d+ and replace it with and empty string.

4
When you say throw away everything else, do you mean keep lines containing the regex or just keep the strings which match the regex?Rohith
Are you saying you want a regex that matches everything except what your regex matches?Fred Foo
your question sounds like a logical mind trap. ;-)splash

4 Answers

2
votes

Try this:

string path = @"c:\temp.txt"; // your file here
string pattern = @".*?(Id:\d+\s?).*?|.+";
Regex rx = new Regex(pattern);

var lines = File.ReadAllLines(path);
using (var writer = File.CreateText(path))
{
    foreach (string line in lines)
    {
        string result = rx.Replace(line, "$1");
        if (result == "")
            continue;

        writer.WriteLine(result);
    }
}

The pattern will preserve spaces between multiple Id:Number occurrences on the same line. If you only have one Id per line you can remove the \s? from the pattern. File.CreateText will open and overwrite your existing file. If a replacement results in an empty string it will be skipped over. Otherwise the result will be written to the file.

The first part of the pattern matches Id:Number occurrences. It includes an alternation for .+ to match lines where Id:Number does not appear. The replacement uses $1 to replace the match with the contents of the first group, which is the actual Id part: (Id:\d+\s?).

1
votes

well, the opposite of \d is \D in perl-ish regexes. Does .net have something similar?

1
votes

Sorry, but I totally don't get what your problem is. Shouldn't it be easy to grep the matches into a new file?

Yoo wrote:

Get all text that isn't "Id":\d+ and replace it with and empty string.

A logical equivalent would be:

Get all text that matches "Id":\d+ and place it in a new file. Replace the old file with the new one.

0
votes

I haven't use .net before, but following works in java

System.out.println("abcd Id:12351abcdf".replaceAll(".*(Id:\\d+).*","$1"));

produces output

Id:12351

Although in true sense it doesnt match the criteria of matching everything except Id:\d+, but it does the job