1
votes

I have a Word document and I need to copy some paragraph of it into a string in VBA. When doing so, the text formatting must be converted to HTML tags. For example if my paragraph looks like this:

Hello I am Alice.

I want to get a string that contains:

Hello I am <b>Alice</b>

(And it would be great if it also work for bulleted list and other kind of formatting).

I am using Microsoft Visual Basic for Applications 7.0. I am new to VBA and a lot of code I found on Internet does not work for me because my version is old. Unfortunately, downloading a more recent version is not an option in my case.

Here is a code sample that works to convert a paragraph to a string without formatting:

Dim pParagraph As Paragraph
'... at some point, pParagraph is set to a paragraph of the document

Dim pRange As Range
Dim pString As String
Set pRange = ActiveDocument.Range(Start:=pParagraph.Range.Start, End:=pParagraph.Range.End - 1)
pString = Trim(pRange.Text)

I did some research on Internet and found the advise to copy the Range to the clipboard and to use Clipboard.getText. Unfortunately Clipboard.getText does not even compile for me.

3
The only way Word can turn its formatting into HTML tags is using a converter. One way is to save a file to HTML, the other is to copy onto the Clipboard. The problem with the latter, which is what you're trying, is that VBA can only get the text from the Clipboard - it won't contain the HTML. VBA does not have Clipboard.getText which is why that won't compile. It does have DataObject which belongs to the MSForms library. DataObject can only retrieve text, however, and won't deliver the HTML or RTF.Cindy Meister
What you found is probably .NET code which isn't part of Office or VBA - doesn't matter how new or old your version is. Saving to HTML in Word probably also won't be satisfactory as it doesn't save to "simple" HTML. You may need to use Find/Replace to search for formatting and append the required tags to the "found" text.Cindy Meister

3 Answers

0
votes

One way I know to get formatting in Word turned into html tags is to use Access. If you create an Access table with a field that has Long Text data type and Rich Text as the Text Format and import your Word text into it, when you query Access to put the Text back into Word it comes out as html tagged text.

0
votes

You could use code like the following as a starting point. Obviously, though, you'll have to extend it to handle all the tags you're concerned with.

Sub ApplyHTML()
Application.ScreenUpdating = False
With ActiveDocument.Range
  '.ListFormat.ConvertNumbersToText
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Format = True
    .Forward = True
    .MatchWildcards = True
    .Wrap = wdFindContinue
    .Font.Underline = True
    .Text = ""
    .Replacement.Text = "<u>^&</u>"
    .Execute Replace:=wdReplaceAll
    .ClearFormatting
    .Font.Bold = True
    .Replacement.Text = "<b>^&</b>"
    .Execute Replace:=wdReplaceAll
    .ClearFormatting
    .Font.Italic = True
    .Replacement.Text = "<i>^&</i>"
    .Execute Replace:=wdReplaceAll
    .ClearFormatting
    .Highlight = True
    .Replacement.Text = "<h>^&</h>"
    .Execute Replace:=wdReplaceAll
  End With
End With
Application.ScreenUpdating = True
End Sub
0
votes

Just a couple of functions i usually use to create HTMLBody in outlook. It may help someone in the future. This process will check by character so it may take a little bit of time. I am using this in a pre-formatted cell in excel but should also work on word document.

Function Convert2HTML(myCell As Range) As String
    Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn, phaTagOn As Boolean
    Dim i, chrCount, spaceCount As Integer
    Dim chrCol, chrLastCol, htmlTxt As String
    
    bldTagOn = False
    itlTagOn = False
    ulnTagOn = False
    colTagOn = False
    phaTagOn = False
    chrCol = "NONE"
    htmlTxt = "<div>"
    chrCount = myCell.Characters.Count
    spaceCount = 0
    For i = 1 To chrCount
        With myCell.Characters(i, 1)
        
            If myCell.Characters(i, 4).Text = "    " And Not phaTagOn Then
                htmlTxt = htmlTxt & "<p style='text-indent: 40px'>"
                phaTagOn = True
            Else
                If myCell.Characters(i, 4).Text = "    " And phaTagOn Then
                    htmlTxt = htmlTxt & "</p><p style='text-indent: 40px'>"
                    phaTagOn = True
                End If
            End If
                
            If (.Font.Color) Then
                chrCol = GetCol(.Font.Color)
                If Not colTagOn Then
                    htmlTxt = htmlTxt & "<font color=#" & chrCol & ">"
                    colTagOn = True
                Else
                    If chrCol <> chrLastCol Then htmlTxt = htmlTxt & "</font><font color=#" & chrCol & ">"
                End If
            Else
                chrCol = "NONE"
                If colTagOn Then
                    htmlTxt = htmlTxt & "</font>"
                    colTagOn = False
                End If
            End If
            chrLastCol = chrCol
            
            If .Font.Bold = True Then
                If Not bldTagOn Then
                    htmlTxt = htmlTxt & "<b>"
                    bldTagOn = True
                End If
            Else
                If bldTagOn Then
                    htmlTxt = htmlTxt & "</b>"
                    bldTagOn = False
                End If
            End If
    
            If .Font.Italic = True Then
                If Not itlTagOn Then
                    htmlTxt = htmlTxt & "<i>"
                    itlTagOn = True
                End If
            Else
                If itlTagOn Then
                    htmlTxt = htmlTxt & "</i>"
                    itlTagOn = False
                End If
            End If
    
            If .Font.Underline > 0 Then
                If Not ulnTagOn Then
                    htmlTxt = htmlTxt & "<u>"
                    ulnTagOn = True
                End If
            Else
                If ulnTagOn Then
                    htmlTxt = htmlTxt & "</u>"
                    ulnTagOn = False
                End If
            End If
            
            If (Asc(.Text) = 10) Then
                htmlTxt = htmlTxt & "<br>"
            Else
                htmlTxt = htmlTxt & .Text
            End If
        End With
    Next
    
    If colTagOn Then
        htmlTxt = htmlTxt & "</font>"
        colTagOn = False
    End If
    If bldTagOn Then
        htmlTxt = htmlTxt & "</b>"
        bldTagOn = False
    End If
    If itlTagOn Then
        htmlTxt = htmlTxt & "</i>"
        itlTagOn = False
    End If
    If ulnTagOn Then
        htmlTxt = htmlTxt & "</u>"
        ulnTagOn = False
    End If
    If phaTagOn Then
        htmlTxt = htmlTxt & "</p>"
        phaTagOn = False
    End If
    htmlTxt = htmlTxt & "</div>"
    fnConvert2HTML = htmlTxt
End Function

Function GetCol(strCol As String) As String
    Dim rVal, gVal, bVal As String
    strCol = Right("000000" & Hex(strCol), 6)
    bVal = Left(strCol, 2)
    gVal = Mid(strCol, 3, 2)
    rVal = Right(strCol, 2)
    GetCol = rVal & gVal & bVal
End Function