0
votes

I'm trying to manipulate some text from a MS Word document that includes hyperlinks. However, I'm tripping up at understanding exactly what Range.Start and Range.End are returning. I banged a few random words into an empty document, and added some hyperlinks. Then wrote the following macro...

Sub ExtractHyperlinks()

    Dim rHyperlink As Range
    Dim rEverything As Range
    Dim wdHyperlink As Hyperlink

    For Each wdHyperlink In ActiveDocument.Hyperlinks
        Set rHyperlink = wdHyperlink.Range
        Set rEverything = ActiveDocument.Range
        rEverything.TextRetrievalMode.IncludeFieldCodes = True
        Debug.Print "#" & Mid(rEverything.Text, rHyperlink.Start, rHyperlink.End - rHyperlink.Start) & "#" & vbCrLf
    Next

End Sub

However, the output between the #s does not quite match up with the hyperlinks, and is more than a character or two out. So if the .Start and .End do not return char positions, what do they return?

3

3 Answers

0
votes

Range.Start returns the character position from the beginning of the document to the start of the range; Range.End to the end of the range.

BUT everything visible as characters are not the only things that get counted, and therein lies the problem.

Examples of "hidden" things that are counted, but not visible:

  • "control characters" associated with content controls
  • "control characters" associated with fields (which also means hyperlinks), which can be seen if field result is toggled to field code display using Alt+F9
  • table structures (ANSI 07 and ANSI 13)
  • text with the font formatting "hidden"

For this reason, using Range.Start and Range.End to get a "real" position in the document is neither reliable nor recommended. The properties are useful, for example, to set the position of one range relative to the position of another.

You can get a somewhat more accurate result using the Range.TextRetrievalMode boolean properties IncludeHiddenText and IncludeFieldCodes. But these don't affect the structural elements involved with content controls and tables.

0
votes

Thank you both so much for pointing out this approach was doomed but that I could still use .Start/.End for relative positions. What I was ultimately trying to do was turn a passed paragraph into HTML, with the hyperlinks.

I'll post what worked here in case anyone else has a use for it.

Function ExtractHyperlinks(rParagraph As Range) As String

    Dim rHyperlink As Range
    Dim wdHyperlink As Hyperlink
    Dim iCaretHold As Integer, iCaretMove As Integer, rCaret As Range
    Dim s As String

    iCaretHold = 1
    iCaretMove = 1
    For Each wdHyperlink In rParagraph.Hyperlinks
        Set rHyperlink = wdHyperlink.Range
        Do
            Set rCaret = ActiveDocument.Range(rParagraph.Characters(iCaretMove).Start, rParagraph.Characters(iCaretMove).End)
            If RangeContains(rHyperlink, rCaret) Then
                s = s & Mid(rParagraph.Text, iCaretHold, iCaretMove - iCaretHold) & "<a href=" & Chr(34) & wdHyperlink.Address & Chr(34) & ">" & IIf(wdHyperlink.TextToDisplay <> "", wdHyperlink.TextToDisplay, wdHyperlink.Address) & "</a>"
                iCaretHold = iCaretMove + Len(wdHyperlink.TextToDisplay)
                iCaretMove = iCaretHold
                Exit Do
            Else
                iCaretMove = iCaretMove + 1
            End If
        Loop Until iCaretMove > Len(rParagraph.Text)
    Next
    If iCaretMove < Len(rParagraph.Text) Then
        s = s & Mid(rParagraph.Text, iCaretMove)
    End If

    ExtractHyperlinks = "<p>" & s & "</p>"

End Function

Function RangeContains(rParent As Range, rChild As Range) As Boolean

    If rChild.Start >= rParent.Start And rChild.End <= rParent.End Then
        RangeContains = True
    Else
        RangeContains = False
    End If

End Function
0
votes

This is a bit of a simplification but it's because rEverything counts everything before the hyperlink, then all the characters in the hyperlink field code (including 1 character for each of the opening and closing field code braces), then all the characters in the hyperlink field result, then all the characters after the field.

However, the character count in the range (e.g. rEverything.Characters.Count or len(rEverything)) only includes the field result if TextRetrievalMode.IncludeFieldCodes is set to False and only includes the field code if TextRetrievalMode.IncludeFieldCodes is set to True.

So the character count is always smaller than the range.End-range.Start.

In this case if you change your Debug expression to something like

    Debug.Print "#" & Mid(rEverything.Text, rHyperlink.Start, rHyperlink.End - rHyperlink.Start - (rEverything.End - rEverything.Start - 1 - Len(rEverything))) & "#" & vbCrLf

you may see results more along the lines you expect.

Another way to visualise what is going on is as follows:

Create a very short document with a piece of text followed by a short hyperlink field with short result, followed by a piece of text. Put the following code in a module:

Sub Select1()
Dim i as long
With ActiveDocument
  For i = .Range.Start to .Range.End
    .Range(i,i).Select
  Next
End With
End Sub

Insert a breakpoint on the "Next" line.

Then run the code once with the field codes displayed and once with the field results displayed. You should see the progress of the selection "pause" either at the beginning or the end of the field, as the Select keeps "selecting" something that you cannot actually see.