0
votes

The following code uses PDFSharp to split out pages of pdf documents into pages that are smaller than A4 and pages that are larger than A3:

''' <summary>
''' Process the list of pdfs
''' </summary>
Public Sub ProcessPdfs()

    Dim tempPath As String

    ' Code omitted

    ' Generate a temporary path in case pdfs need to be saved
    If String.IsNullOrEmpty(Me.tempFolder) OrElse Not Directory.Exists(Me.tempFolder) Then
        tempFolder = Path.GetTempPath()
    End If
    tempPath = Path.Combine(Me.tempFolder, Path.GetRandomFileName + ".pdf")

    ' Loop through the pages of the pdfs and process each page in turn. Processing involves
    ' determining the size of the page, then shrinking, adding the footer and then adding to
    ' the appropriate output pdf
    For Each referenceNumber As String In Me.Pdfs.Keys
        For Each pdf As PdfDocument In Me.Pdfs(referenceNumber)
            ' Save the pdf to disk for PDFSharp to be able to read it properly
            If String.IsNullOrEmpty(pdf.FullPath) Then
                pdf.Save(tempPath)
                pdf = PdfReader.Open(tempPath)
            End If
            For Each page As PdfPage In pdf.Pages

                ' Code omitted

                Select Case pageArea
                    Case Is <= a4PageArea
                        Call AddPage(referenceNumber, pdf, page, PageSize.A4)
                    Case Else
                        Call AddPage(referenceNumber, pdf, page, PageSize.A3)
                End Select
            Next
        Next
    Next

    ' Code omitted

    ' Delete temporary pdfs if there are any
    If File.Exists(tempPath) Then
        File.Delete(tempPath)
    End If

End Sub

''' <summary>
''' Add the specified page to the specified output document
''' </summary>
''' <returns>The page which was added to the output pdf</returns>
Private Function AddPage(ByVal ReferenceNumber As String, ByVal ParentPdf As PdfDocument, ByVal ParentPdfPage As PdfPage, ByVal PageSize As PageSize) As PdfPage

    ' Code omitted

    ' Copy the specified page onto thew newly created page
    Using parentForm As XPdfForm = XPdfForm.FromFile(ParentPdf.FullPath)
        parentForm.PageIndex = ParentPdf.Pages.Cast(Of PdfPage)().ToList().IndexOf(ParentPdfPage)
        scaleFactor = 1
        ' Create PdfSharp graphics object with which to write onto the page
        Using graphics As XGraphics = XGraphics.FromPdfPage(outputPdfPage)
            graphics.SmoothingMode = XSmoothingMode.HighQuality

            ' Code omitted

            ' Draw the page
            Call graphics.DrawImage(parentForm, targetRect)
        End Using
    End Using

    Return outputPdfPage

End Function

What this does is take a pdf, read esch page and then scale it so that it fits the size of the page onto which it is to be printed.

PDFSharp has trouble opening documents which were created in Adobe v6, so I use iTextSharp to rebuild the pdf in a version that PDFSharp can open. These PDFs are rebuilt in memory, and for some reason they need to be written to disk for the PDFSharp to process them correcly.

In ProcessPdfs() I check if the pdf has a physical path and if not I save it at a temp location.

The problem I found is that AddPage() seems to continuously work with the same pdf. I checked the temporary pdf files created on disk and they are correct, i.e. different each time.

But the file loaded in the first using statement by XPdfForm.FromFile(ParentPdf.FullPath) never changes. It's as if the code realises that the file path does not change and so decides not to reload the file.

I thought that using a using statement would ensure that the variable would be disposed of at the end and therefore the file would be reloaded anew every time. Am I misunderstanding? Or what is happening here?

Incidentally I worked around this by saving each pdf file under a different file name. Which is why I think that the variable from the using block is being reused every time based on the file name...

1
No, the variable is not being reused. It is quite possible that a new instance (object) is being created that points to the same file on disk. - John Saunders
But then that should be fine, since the file has been overwritten with the next pdf by that stage, right? I've checked it has by opening the file manually in between it being processed. Yet the has the content has the correct page sizes, but the content of each page is that of the very first pdf to be processed. - yu_ominae
You could try explicitly deleting the file after pdf sharp has finished with it, but I'd say the different file name each solution you have is simpler and probably faster and you could thread it. - Tony Hopkinson
Yes, I ended up saving each file under a different random name and cleaning up right after using it (sensitive information...) and it works fine now. I just don't get why I had to do this. The way I understand that using statement the filename shouldn't matter, right? - yu_ominae
The problem here is "why does it not do what I think it should do?". I posted this question after having found the workaround. I want to know why I need a workaround in the first place. - yu_ominae

1 Answers

0
votes

The XPdfForm caches the documents internally - and the filename is the key. If you re-use the filename for a new document, the old, cached document will be used.

The cache is thread-local.

So it's not a bug, it's a feature.

It should be possible to use streams instead of files.