0
votes

I am trying to recursively trawl through a directory structure looking for word docs and then extracting hyperlinks. When the code executes the output is as follows:

processing 2 docs

File Name                Hyperlink
---------                ---------
C:\temp\doc1.docx
C:\temp\doc1.docx
C:\temp\folder\doc2.docx
C:\temp\folder\doc2.docx

Nothing I have tried seems to work. I have tried using:

  • "Hyperlink" = $_Address
  • "Hyperlink" = $thisDoc.Address
  • "Hyperlink" = $thisDoc.Hyperlink.Address
Clear-Host

$parentFolder = "C:\temp"

$ourDocs = Get-ChildItem -Recurse -LiteralPath $parentFolder -file -include *.doc*
"processing {0} docs" -f $ourDocs.Count


$word = New-Object -ComObject word.application

$word.Visible = $false
$word.ScreenUpdating = $false


$array = New-Object System.Collections.ArrayList

$ourDocs | ForEach-Object{

    $thisDoc = $word.Documents.Open($_.FullName)

    $thisDoc.Hyperlinks | ForEach-Object {

        $array.Add([pscustomobject]@{
        
            "File Name" = $thisDoc.FullName
            "Hyperlink" = $_Address}) | Out-null
        
    }
    $thisDoc.Close()
                
}

$Word.Quit()

$array

# cleanup com objects
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
1

1 Answers

0
votes

The mistake is in how you are calling this for the property value you want.

Try this... refactor

Clear-Host

$parentFolder = "D:\temp\Word"

$ourDocs = Get-ChildItem -Recurse -LiteralPath $parentFolder -file -include '*.doc*'
"processing {0} docs" -f $ourDocs.Count


$word                = New-Object -ComObject word.application
$word.Visible        = $false
$word.ScreenUpdating = $false

# This really is not needed for your posted use case.
# $array = New-Object System.Collections.ArrayList

$ourDocs | 
ForEach-Object{
    $thisDoc = $word.Documents.Open($PSItem.FullName)

    @($thisDoc.Hyperlinks) | 
    ForEach-Object {
        [pscustomobject]@{
            FileName  = $thisDoc.FullName
            HyperLink = $PSitem.Address
        }
    }
    $thisDoc.Close()
}

$Word.Quit()


# cleanup com objects
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

# Results
<#
processing 4 docs

FileName                     HyperLink                                        
--------                     ---------                                        
D:\temp\Word\WES - Copy.docx http://stackoverfow.com/                         
D:\temp\Word\WES - Copy.docx https://superuser.com/questions/tagged/powershell
#>

Update relative to your Csv comment and my response to it...

...

$ourDocs | 
ForEach-Object{
    $thisDoc = $word.Documents.Open($PSItem.FullName)

    @($thisDoc.Hyperlinks) | 
    ForEach-Object {
        [pscustomobject]@{
            FileName  = $thisDoc.FullName
            HyperLink = $PSitem.Address
        }
    } | 
    Export-Csv -Path 'D:\Temp\WordHyperLinkReport.csv' -Append -NoTypeInformation
    $thisDoc.Close()
}

...

Import-Csv -Path 'D:\Temp\WordHyperLinkReport.csv'
# Results
<#
FileName                     HyperLink                                        
--------                     ---------                                        
D:\temp\Word\WES - Copy.docx http://stackoverfow.com/                         
D:\temp\Word\WES - Copy.docx https://superuser.com/questions/tagged/powershell
#>