2
votes

I wanna write a regular expression that can extract file types from a string.

the string is like:

Text Files (.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|PDF Files (.pdf)|.pdf|Excel Files (.xls;.xlsx;.xlsm;.xlsb;.xlam;.xltx;.xltm;.xlw)

result e.g.

.prn

3
Why not use the built in classes? - Matt Ellen
@Matt Ellen, perhaps because it doesn't fulfill the requirements at all... The OP is not trying to extract the extension from a file name - Thomas Levesque

3 Answers

1
votes

You have the dialog filterformat.

The extensions already appear twice (first appearance is unreliable) and when you try to handle this with a RegEx directly you'll have to think about

 Text.Files (.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|

etc.

It looks safer to follow the known structure:

string filter = "Text Files (.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|PDF Files (.pdf)|.pdf|Excel Files (.xls;.xlsx;.xlsm;.xlsb;.xlam;.xltx;.xltm;.xlw)";

string[] filterParts = filter.Split("|");

// go through the odd sections
for (int i = 1; i < filterParts.Length; i += 2)
{
    // approx, you may want some validation here first
    string filterPart = filterParts[i];

    string[] fileTypes = filterPart.Split(";");
    // add to collection
}

This (only) requires that the filter string has the correct syntax.

0
votes
Regex extensionRegex = new Regex(@"\.\w+");
foreach(Match m in extensionRegex.Matches(text))
{
    Console.WriteLine(m.Value);
}
0
votes

If that string format you have there is fairly fixed, then the following should work:

\.[^.;)]+