2
votes

I have some text that goes like this:

Lorem ipsum dolor <code>sit amet, consectetuer adipiscing elit,</code> sed diam nonummy nibh euismod tincidunt ut <code>laoreet dolore magna</code> aliquam erat volutpat.

I am trying to remove everything between each pair of "code" tags. I wrote a function that works well when there is only one pair of tags per cell, but it doesn't take care of multiple instances. Here is the desired output:

Lorem ipsum dolor <code></code> sed diam nonummy nibh euismod tincidunt ut <code></code> aliquam erat volutpat.

How would you advise me to do?

3
Do you mean you're trying to remove everything between the "code" tags?gtr1971
Yes, that's exactly what I am trying to do.user1029296
What is your desired output? Please add it to your post.Excellll
I just did it. Is it clearer?user1029296

3 Answers

0
votes

Based on macro recorder:

Sub Test()
    'working for selection replacing all <*> sections
    Selection.Replace What:="<*>", Replacement:="", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
End Sub

Edit attempt 2, after comments from OP:

Sub Attempt_second()
    'working for selection replacing all <*> sections
    Selection.Replace What:="<*code>*<*/*code>", Replacement:="<code></code>", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
End Sub

It will replace text into <code></code> removing additional spaces in between.

1
votes

This VBA function can be used to strip out open and close HTML tags and what they enclose. It uses regex, which should be OK in this limited usage (but beware using regex to parse HTML).

Function stripEnclosed(strIn As String) As String
Dim re As VBScript_RegExp_55.RegExp, AllMatches As VBScript_RegExp_55.MatchCollection, M As VBScript_RegExp_55.Match
Dim closeIndex As Long
tmpstr = strIn
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "<[^/>]+>"
Set AllMatches = re.Execute(tmpstr)
For Each M In AllMatches
    closeIndex = InStr(tmpstr, Replace(M.Value, "<", "</"))
    If closeIndex <> 0 Then tmpstr = Left(tmpstr, InStr(tmpstr, M.Value) - 1) & Mid(tmpstr, closeIndex + Len(M.Value) + 1)
Next M
stripEnclosed = tmpstr
End Function

Note: you'll have to add the 'Microsoft VBScript Regular Expressions 5.5' reference to your VBA project.

If you only want to remove a certain tag (e.g. <CODE> and </CODE>) just replace the re.Pattern = "<[^/>]+>" line of the code with the following:

re.Pattern = "<CODE>"
-1
votes

KazJaw's answer is simple, elegant and seems to meet your needs.

I took a completely different approach:

Public Function StripHTML(str As String) As String

Dim RegEx As Object
Set RegEx = CreateObject("vbscript.regexp")
With RegEx
    .Global = True
    .IgnoreCase = True
    .MultiLine = True
    .Pattern = "<[^>]+>"
End With

StripHTML = RegEx.Replace(str, "")
Set RegEx = Nothing

End Function