Eliminate html tags from values

Question

I'm trying to eliminate HTML tags from a value displayed in an ssrs report.

My solution came to: =(new System.Text.RegularExpressions.Regex("<[^>]*>")).Replace((new System.Text.RegularExpressions.Regex("< STYLE >. *< /STYLE >")).Replace(Fields!activitypointer1_description.Value,""),"")

The problem is that the second expression ("< STYLE >. *< /STYLE >" without the spaces) which should be executed first doesn't do anything. The result contains the styles from the html without the tags attached.

I'm out of ideas.

C

The second expression is not executing first. What made you think it executes first? — DonkeyMaster
What made me think that? 1st thing that pops in mind is the language. I mean, if I take the code and execute it in a C# program, it does what it should. What makes you think it executes second? — Cosmin
oops, my bad. You're right, the STYLE regex does execute before the HTML one. — DonkeyMaster

DonkeyMaster DonkeyMaster · Accepted Answer · 2009-05-26T13:28:52

You need to add RegexOptions.Singleline, because by default Regular expressions will stop on newline characters. Here's an example of a console program you can run to verify it:

string decription = @"<b>this is some 
text</b><style>and 
this is style</style>";
        Console.WriteLine(
            (new Regex( "<[^>]*>", RegexOptions.IgnoreCase | RegexOptions.Singleline ))
            .Replace(
                (new Regex( "<STYLE>.*</STYLE>", RegexOptions.IgnoreCase | RegexOptions.Singleline ))
                    .Replace( decription
                    , "" )
            , "" )
         );

Eliminate html tags from values

1 Answers