3
votes

I am trying to figure out how to write a script that goes through a folder and grabs all word documents in the folder to search for a hyperlink and change the link to another link. Then to save that word document as well as create another version of it converting it to a pdf.

How can I adjust the script below to grab all word documents in a folder then search all hyperlinks for "https://www.yahoo.com" to "https://www.google.com". Some how looping through the entire document searching through ALL hyperlinks. Saving that document and then converting and giving a new pdf.

Is this possible?

What I have so far

    $word = New-Object -ComObject word.application
        $document = $word.documents.open("path to folder")
        $hyperlinks = @($document.Hyperlinks) 
        $hyperlinks | ForEach {
            $newURI = ([uri]$_.address).AbsoluteUri
            Write-Verbose ("Updating {0} to {1}" -f $_.Address,$newURI) -Verbose
            $_.address = $newURI
        }
        If (_$.address -eq "https://www.yahoo.com/") {
            $_.Address = "https://www.google.com/"
        } ElseIf ($_.Address -eq "http://def.com/motorists") {
            $_.Address = "http://hij.com/"
        }
        $document.save()
        $word.quit()

    Get-ChildItem -Path $document -Include *.doc, *.docx -Recurse |
        ForEach-Object {
            $doc = $word.Documents.Open($_.Fullname)
            $pdf = $_.FullName -replace $_.Extension, '.pdf'
            $doc.ExportAsFixedFormat($pdf,17)
            $doc.Close()
        }
    $word.Quit()

I am new to powershell will someone please help walk me through these steps. I hear powershell is probably the best and strongest language to get this sort of thing accomplished.

1

1 Answers

6
votes

Hadn't done this before, so it was nice to figure it out. We both get to learn today! You were very close. Just needed a few adjustments and a loop for handling multiple files. I'm sure someone more knowledgeable will drop in but this should get you the desired result.

$NewDomain1 = "google"
$NewDomain2 = "hij"
$OurDocuments = Get-ChildItem -Path "C:\Apps\testing" -Filter "*.doc*" -Recurse

$Word = New-Object -ComObject word.application
$Word.Visible = $false

$OurDocuments | ForEach-Object {
    $Document = $Word.documents.open($_.FullName)
    "Processing file: {0}" -f $Document.FullName
    $Document.Hyperlinks | ForEach-Object {
        if ($_.Address -like "https://www.yahoo.com/*") {
            $NewAddress = $_.Address -Replace "yahoo","google"
            "Updating {0} to {1}" -f $_.Address,$NewAddress
            $_.Address = $_.TextToDisplay = $NewAddress
        } elseif ($_.Address -like "http://def.com/*") {
            $NewAddress = $_.Address -Replace "def","hij"
            "Updating {0} to {1}" -f $_.Address,$NewAddress
            $_.Address = $_.TextToDisplay = $NewAddress
        }
    }

    "Saving changes to {0}" -f $Document.Fullname
    $Document.Save()    

    $Pdf = $Document.FullName -replace $_.Extension, '.pdf'
    "Saving document {0} as PDF {1}" -f $Document.Fullname,$Pdf
    $Document.ExportAsFixedFormat($Pdf,17)

    "Completed processing {0} `r`n" -f $Document.Fullname
    $Document.Close()
}

$Word.Quit()

Let's walk through it...

We'll first move your new addresses into a couple of variables for ease of referencing and changing in the future. You can also add the addresses that you're looking for here, replacing the hard-coded strings as needed. The third line uses a filter to grab all .DOC and .DOCX files in the directory, which we'll use to iterate over. Personally, I would be careful using the -Recurse switch, as you run the risk of making unintended changes to a file deeper in the directory structure.

$NewAddress1 = "https://www.google.com/"
$NewAddress2 = "http://hij.com/"
$OurDocuments = Get-ChildItem -Path "C:\Apps\testing" -Filter "*.doc*" -Recurse

Instantiate our Word Com Object and keep it hidden from view.

$Word = New-Object -ComObject word.application
$Word.Visible = $false

Stepping into our ForEach-Object loop...

For each document that we gathered in $OurDocuments, we open it and pipe any hyperlinks into another ForEach-Object, where we check the value of the Address property. If there's a match that we want, we update the property with the new value. You'll notice that we're also updating the TextToDisplay property. This is the text that you see in the document, as opposed to Address which controls where the hyperlink actually goes.

This... $_.Address = $_.TextToDisplay = $NewAddress1 ...is an example of multi-variable assignment. Since Address and TextToDisplay will be set to the same value, we'll assign them at the same time.

$Document = $Word.documents.open($_.FullName)
"Processing file: {0}" -f $Document.FullName
$Document.Hyperlinks | ForEach-Object {
    if ($_.Address -like "https://www.yahoo.com/*") {
        $NewAddress = $_.Address -Replace "yahoo","google"
        "Updating {0} to {1}" -f $_.Address,$NewAddress
        $_.Address = $_.TextToDisplay = $NewAddress
    } elseif ($_.Address -like "http://def.com/*") {
        $NewAddress = $_.Address -Replace "def","hij"
        "Updating {0} to {1}" -f $_.Address,$NewAddress
        $_.Address = $_.TextToDisplay = $NewAddress
    }
}

Save any changes made...

"Saving changes to {0}" -f $Document.Fullname
$Document.Save()    

Here we create the new filename for when we save as a PDF. Notice $_.Extension in our first line. We switch to using the pipeline object for referencing the file extension since the current pipeline object is still the file info object from our Get-ChildItem. Since the $Document object doesn't have an extension property, you'd have to do some slicing of the file name to achieve the same result.

$Pdf = $Document.FullName -replace $_.Extension, '.pdf'
"Saving document {0} as PDF {1}" -f $Document.Fullname,$Pdf
$Document.ExportAsFixedFormat($Pdf,17)

Close the document up and the loop will move to the next file in $OurDocuments.

"Completed processing {0} `r`n" -f $Document.Fullname
$Document.Close()

Once we run through all documents, we close Word.

$Word.Quit()

I hope that all makes sense!