Extract variables from text using RegEx and c#

Question

I have a possibly simple task ahead of me, but my RegEx skills are poor. Can anyone help me, or point me in the right direction? :-)

Example text I'm parsing, And I would like to do a foreach on the results where I can get the variable "URL" and the text in between:

Lorem ipsum dolor sit amet, consectetur[URL=/test.aspx?ID=12345]lorem ipsum[/URL] adipiscing elit. Nullam interdum eleifend mauris, nec condimentum nisi lacinia sit amet. Mauris faucibus, orci ac[URL=/Default.aspx?ID=222222]lorem[/URL] convallis volutpat, dolor libero sollicitudin quam, id feugiat magna orci[URL=/Default.aspx?ID=333333]lorem ipsum dolor[/URL] quis augue. Integer nec euismod sem.

This may be some help: regular-expressions.info/tutorial.html — Purplegoldfish
How about using String.IndexOf() API to find the URL value and then from that index you can read upto next URL string is found. Hope your getting the funda? — Zenwalker
When you feel comfortable enough you can take a look at this gem : shop.oreilly.com/product/9781565922570.do — FailedDev

Michael Low Michael Low · Accepted Answer · 2011-10-19T10:31:36

This should do it for you:

Regex theRegex = new Regex(@"\[URL=([^\]]+)\]([^\[]+)\[/URL\]");
string text = "Lorem ipsum dolor sit amet, consectetur[URL=/test.aspx?ID=12345]lorem ipsum[/URL] adipiscing elit. Nullam interdum eleifend mauris, nec condimentum nisi lacinia sit amet. Mauris faucibus, orci ac[URL=/Default.aspx?ID=222222]lorem[/URL] convallis volutpat, dolor libero sollicitudin quam, id feugiat magna orci[URL=/Default.aspx?ID=333333]lorem ipsum dolor[/URL] quis augue. Integer nec euismod sem.";
MatchCollection matches = theRegex.Matches(text);
foreach (Match thisMatch in matches)
{
//        thisMatch.Groups[0].Value is e.g. "[URL=/test.aspx?ID=12345]lorem ipsum[/URL]"
//        thisMatch.Groups[1].Value is e.g. "/test.aspx?ID=12345"
//        thisMatch.Groups[2].Value is e.g. "lorem ipsum"

}

Extract variables from text using RegEx and c#

3 Answers