1
votes

I have found some lovely code on your site from a posting a few years ago, that gives me the specific table, row and column information to export to Excel, and it works well. (Thanks to the original posters).

However, I have now been asked to grab another table cell, and this one has several paragraphs within it, which have been auto-numbered as a list within the cell (or in another, bullet pointed). I don't always know how many items will be in the list, but I need the full cell contents.

The issue I have is that when the data exports to Excel via the coding, it looses the numbering, and the carriage returns, and basically all runs together without any break to the previous line's data.

eg -

  1. P&ID 111222
  2. DWG 111-5456
  3. DOC 512BC-1234

becomes on export:
P&ID 111222DWG 111-5456DOC512BC-1234

Could anyone please advise how to adjust the code to stop the data from running together? I would be happy to get the data in the one Excel cell, or in several if that is the way it has to be.

Thanks in advance, Wendy

  Sub wordScrape()

Dim wrdDoc As Object
Dim objFiles As Object
Dim fso As Object
Dim wordApp As Object
Dim sh1 As Worksheet
Dim x As Integer

' Change this to the folder containing your word documents
FolderName = "Y:\120\TEST"

Set sh1 = ThisWorkbook.Sheets(1)
Set fso = CreateObject("Scripting.FileSystemObject")
Set wordApp = CreateObject("Word.application")
Set objFiles = fso.GetFolder(FolderName).Files

x = 1
For Each wd In objFiles
    If InStr(wd, ".docx") And InStr(wd, "~") = 0 Then
        Set wrdDoc = wordApp.Documents.Open(wd.Path, ReadOnly = True)
        
        'word document file name
        sh1.Cells(x, 1) = wd.Name
        
        'document number - Table 1, Row 2, Column 1
        sh1.Cells(x, 2) = Application.WorksheetFunction.Clean(wrdDoc.Tables(1).Cell(Row:=2, Column:=1).Range)
        
        'document title - Table 1, Row 3, Column 1
        sh1.Cells(x, 3) = Application.WorksheetFunction.Clean(wrdDoc.Tables(1).Cell(Row:=3, Column:=1).Range)
        
        'cell for tags for document - Table 1, Row 9, Column 2
        ' note - if more than 1 line, and automatic numbering in WORD doc, when exported, will remove numbering and line breaks - runs everything together
        sh1.Cells(x, 4) = Application.WorksheetFunction.Clean(wrdDoc.Tables(1).Cell(Row:=9, Column:=2).Range)
        
        'cell that notes frequency for doc - Table 1, Row 16, Column 2
        sh1.Cells(x, 5) = Application.WorksheetFunction.Clean(wrdDoc.Tables(1).Cell(Row:=16, Column:=2).Range)
        
               
        'sh1.Cells(x, 3) = ....more extracted data....
        
        x = x + 1
    wrdDoc.Close
    End If

Next wd
wordApp.Quit
End Sub
1

1 Answers

2
votes

This will extract regular or bulleted text from a Word table cell, formatted for use in an Excel cell.

It will add "bullets" or numbers if the text in word is formatted as a list (but note the numbers will be off if the cell has mixed formatting)

'get the text from a table cell
Function CellContent(wdCell) As String
    Dim s As String, i As Long, pc As Long, p As Object
    pc = wdCell.Range.Paragraphs.Count
    'loop over paragraphs in cell (could just be 1)
    For i = 1 To pc
        s = s & IIf(i > 1, Chr(10), "") 'line break if not first para

        Set p = wdCell.Range.Paragraphs(i)
        'any list format applied ?
        Select Case p.Range.listformat.listtype
            Case 2: s = s & "* " 'bullet
            Case 3: s = s & i & ". " 'numbered
        End Select
        s = s & p.Range.Text
    Next i

    CellContent = Left(s, Len(s) - 1) 'trim off end-of-cell mark from Word
End Function

Here's how you'd call it from your current Sub:

sh1.Cells(x, 4) = CellContent( wrdDoc.Tables(1).Cell(9, 2) )