1
votes

I have a bunch of word documents that each contain a single table, some of which hold an exorbitant amount of data (20,000+ rows perhaps) and hence can stretch over hundreds of pages long.

With that being said, I found a VBA word macro that can display all row indices that start every page. For example, the macro will display 100 integers for a table that stretches for 100 pages. This is exactly what I need but for various reasons, the macro runs very slow. Furthermore, it runs even slower when I adapted the code and embedded it into an excel macro (to use on a word object).

So my question is - can this macro be somehow optimized? I suppose the looping is causing the problem. Many thanks for your input!

Sub TableRowData()
    'define meaningful names to use for array's first dimension
    Const pgnum = 1
    Const startrow = 2
    Const endrow = 3

    Dim data() As Long  ' array to hold data
    Dim rw As Row       ' current row of table
    Dim rownum As Long  ' the index of rw in table's rows
    Dim datarow As Long ' current value of array's second dimension
    Dim rg As Range     ' a range object for finding the page where rw starts

    'initialization
    ReDim data(3, 1)
    Set rw = ActiveDocument.Tables(1).Rows(1)
    rownum = 1
    datarow = 1

    'store the page number and row number for the first row of the table
    Set rg = rw.Range
    rg.Collapse wdCollapseStart
    data(pgnum, datarow) = rg.Information(wdActiveEndAdjustedPageNumber)
    data(startrow, datarow) = rownum

    'Step through the remaining rows of the table.
    'Each time the page number changes, store the preceding row as the
    'last row on the previous page; then expand the array and store the
    'page number and row number for the new row.
    While rownum < ActiveDocument.Tables(1).Rows.Count
        Set rw = rw.Next
        rownum = rownum + 1
        Set rg = rw.Range
        rg.Collapse wdCollapseStart
        If rg.Information(wdActiveEndAdjustedPageNumber) > data(pgnum, datarow) Then
            data(endrow, datarow) = rownum - 1
       
            ReDim Preserve data(3, datarow + 1)
            datarow = datarow + 1
            data(pgnum, datarow) = rg.Information(wdActiveEndAdjustedPageNumber)
            data(startrow, datarow) = rownum
        End If
    Wend

    'finish up with the last row of the table
    data(endrow, datarow) = rownum
     
    Dim msg As String
    Dim i As Long
    For i = 1 To UBound(data, 2)
        msg = msg & data(startrow, i) & vbCr
    Next i
    MsgBox msg
End Sub
3
If the code in your question is being run from Excel you need to change the declarations for rw and rg as these are ambiguous. Both Excel and Word contain Range and Row objects, so you need to prefix the datatype with the library name, i.e. Word.Range, Word.Row - Timothy Rylatt
Thank you Tim. Yes, in my excel adaption, I did add the prefix - otherwise it wouldn't even run. I also included a reference to the Microsoft Word object model to make life easier. However despite doing all of this, it just runs too slow for the time being. Forget about the 500+ page word document, anything around 100+ pages will take roughly 30-60 minutes. Just a little too long. - user3624032

3 Answers

3
votes

Try something based on:

Sub TableRowData()
Dim Doc As Document, Rng As Range, Data() As Long, i As Long, j As Long, p As Long, r As Long, x As Long
Set Doc = ActiveDocument
With Doc
  With .Tables(1).Range
    i = .Cells(1).Range.Characters.First.Information(wdActiveEndAdjustedPageNumber)
    j = .Cells(.Cells.Count).Range.Characters.Last.Information(wdActiveEndAdjustedPageNumber)
    ReDim Data(3, j - i)
    For p = i To j
      Set Rng = Doc.Range.GoTo(What:=wdGoToPage, Name:=p)
      Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\page")
      r = Rng.Cells(1).RowIndex
      x = p - i: Data(1, x) = x: Data(2, x) = p: Data(3, x) = r
    Next
  End With
End With
End Sub
1
votes

Processing tables row by row is notoriously slow and there is little you can do to speed things up.

One thing that will help is to turn off screen updating. At the start of your routine add Application.ScreenUpdating = False and at the end Application.ScreenUpdating = True.

The other thing you can experiment with is using a For Each loop. There is some disagreement as to whether or not this method is faster. Having a large table to process will give you a pretty good idea of which is the faster method, but don't expect miracles. Whichever method you adopt you are going to need patience.

Sub TableRowData()
    Application.ScreenUpdating = False
    'define meaningful names to use for array's first dimension
    Const pgnum = 1
    Const startrow = 2
    Const endrow = 3

    Dim data() As Long  ' array to hold data
    Dim rw As Row       ' current row of table
    Dim rownum As Long  ' the index of rw in table's rows
    Dim datarow As Long ' current value of array's second dimension
    'Dim rg As Range     ' a range object for finding the page where rw starts

    'initialization
    ReDim data(3, 1)
    Set rw = ActiveDocument.Tables(1).Rows(1)
    rownum = 1
    datarow = 1

    'store the page number and row number for the first row of the table
    Set rg = rw.Range
    rg.Collapse wdCollapseStart
    data(pgnum, datarow) = rg.Information(wdActiveEndAdjustedPageNumber)
    data(startrow, datarow) = rownum

    'Step through the remaining rows of the table.
    'Each time the page number changes, store the preceding row as the
    'last row on the previous page; then expand the array and store the
    'page number and row number for the new row.
    'While rownum < ActiveDocument.Tables(1).Rows.Count
    For Each rw In ActiveDocument.Tables(1).Rows
        'Set rw = rw.Next
        rownum = rownum + 1
        'Set rg = rw.Range
        'rg.Collapse wdCollapseStart
        If rw.Range.Information(wdActiveEndAdjustedPageNumber) > data(pgnum, datarow) Then
            data(endrow, datarow) = rownum - 1
       
            ReDim Preserve data(3, datarow + 1)
            datarow = datarow + 1
            data(pgnum, datarow) = rw.Range.Information(wdActiveEndAdjustedPageNumber)
            data(startrow, datarow) = rownum
        End If
    Next rw
    'Wend

    'finish up with the last row of the table
    data(endrow, datarow) = rownum
     
    Dim msg As String
    Dim i As Long
    For i = 1 To UBound(data, 2)
        msg = msg & data(startrow, i) & vbCr
    Next i
    MsgBox msg
    Application.ScreenUpdating = True
End Sub
1
votes

How about looping through the pages and getting the row number?

Would that work?

Dim doc As Document
Dim rng As Range
Dim pg As Long

    Application.ScreenUpdating = False

    Set doc = ThisDocument
    
    
    For pg = 1 To doc.Range.Information(wdNumberOfPagesInDocument)
        Set rng = doc.GoTo(wdGoToPage, wdGoToAbsolute, pg)
        Debug.Print rng.Information(wdEndOfRangeRowNumber)
    Next pg