0
votes

I am in the process of converting a series of 3500 html documents to Word for a documentation repository. We've run into a problem where some hyperlinks are broken on the back end of the conversion for no apparent reason. I want to generate a list of filenames and the links contained in each to see if I can spot any patterns and adjust my conversion program accordingly. Unfortunately, searches that include PowerShell and hyperlinks lead to a lot of items about how to ADD hyperlinks using Powershell, and none of the situations have been applicable to my needs.

Using this link and this link as my starting point with this code....

$word = New-Object -ComObject Word.Application
$document = $word.documents.open("C:\users\administrator\desktop\TEST.docx") 
$document.Hyperlinks 
([uri]"http://domain.com/This is a bad link").AbsoluteUri 
$hyperlinks = @($document.Hyperlinks) 
$hyperlinks | ForEach {
    If ($_.Address -match "\s") {
        $newURI = ([uri]$_.address).AbsoluteUri
        Write-Verbose ("Updating {0} to {1}" -f $_.Address,$newURI) -Verbose
        $_.address = $newURI
    }
}
$document.save()
$word.quit() 

I've been trying to craft something that will meet my needs. I can duplicate the above script's results, but have not been able to get a successful run iterating through all the documents in a directory with a ForEach command. I'm trying to change all links from html to doc, but the second I insert this code:

If ($.Address. -match ".\.doc") {
    $newExt = ".doc" ;
    $newURI = ([uri]$$_.address).BaseName.$newExt.

I get out of bounds and command failure errors at runtime. This Link helped, and this link answers my question for VBA/VBScript...but not PowerShell. Does anyone have a Powershell solution for this?

2

2 Answers

0
votes

Someone had asked a similar question, for Excel a while ago: Excel & Powershell: Bulk Find and replace URL's used in formulas

So, once you have hyperlinks you could simply replace the .html to .doc using -replace. For example:

$hyperlinks | % {$_.TextToDisplay = $_.address= $_.address -replace '.html','.doc'}

Note that If you do not change TextToDisplay, hyperlink address will change but you will still be seeing the old values.

0
votes

Might have something to do with the following:

If ($.Address. -match ".\.doc") {
             ^
    $newExt = ".doc" ;
    $newURI = ([uri]$$_.address).BaseName.$newExt.
                     ^                           ^

Why not rewrite it into something like this (you'll need to find the right types like Hyperlink yourself)

$toChange = $document.Hyperlinks | ? { $_.address.endswith('.doc') } | % { $_.address = $_.address.replace('.doc', '.html') }