0
votes

The ghostcript (version 9.21) ignores attachments within a pdf a file

Command used cmd /c %GHOST_SCRIPT_EXE% -dPDFA=2 -dBATCH -dNOPAUSE -dSubsetFonts=false -dPDFSETTINGS=/printer -sProcessColorModel=DeviceRGB -sDEVICE=pdfwrite -dCompatibilityLevel=1.7 -dOptimize=true -dPDFACompatibilityPolicy=1 -dAutoRotatePages=/None -sOutputFile="out.pdf" "test.pdf"

test.pdf

As you can see, test.pdf has an attachment 1.pdf. But in the converted pdf i.e., out.pdf does not have 1.pdf. out.pdf

The pdf files are attached test.pdf and out.pdf

1
What exactly were you expecting Ghostscript to do with the attachment ? The answer is yes, because Ghostscript is intended for printing, it does not support many interactive features of PDF. Since an attachment can be anything Ghostscript doesn't attempt to do anything with them. In any event, your command line won't create a valid PDF/A file unless you are very fortunate, because you haven't followed the guidance in the documentation.KenS
Thank you @KenS for the feedback. I would like to keep the attached pdf with converted pdfa as well. Is that possible ?JavaGeek
Anything is 'possible', its a simple matter of programming... It looks to me like this probably ought to work already (in the current version of Ghostscript, 9.23) but since you haven't supplied an example, I can't test it. I'm uncertain of the implications of doing this, because the (PDF/A-2) specification talks about 'PDF/A-compliant file attachments' but doesn't say what that actually means. The specification describes embedded files, but not attached files. Possibly only PDF/A files are legal as attachments.Checking that would be 'difficult'.KenS
@KenS Same problem with ghostscript 9.23 as well. As you can see out.png, the converted file does not have attached 1.pdf. (attachement icon is visible, but actual file is missing).JavaGeek
Well, in the absence of an example file, there's not a lot I can say. The current implementation of the PDF interpreter in Ghostscript takes action on a FileAttachment annotation, if the resulting annotation is incorrect then that could be because the pdfmark generated from it is incorrect, or the pdfwrite code to rewrite the annotation has a bug. The FileAttachment annotation is processed in ghostpdl/Resource/Init/pdf_draw.ps so you can debug it to see what pdfmark is generated and compare it against the original and created PDF files. <continued>KenS

1 Answers

0
votes

This isn't merely an attached file, its an embedded file.

The embedded data is not copied for several reasons. Firstly because we don't support embedded files at all in Ghostscript (we can't do anything useful with them) and secondly because you are creating a PDF/A file.

The embedded file is only valid in a PDF/A if its also a PDF/A file (your embedded file is not a PDF/A PDF file, so it would need to be converted first). There's no way for Ghostscript to easily verify that, so we (again) don't copy embedded files.

You can (of course) enhance Ghostscript yourself to do so. You will need to handle the /EF (Embedded File) key and create the stream using pdfmarks, then insert that into the dictionary for the /FS (FileSpec) key in the FileAttachment annotation.

[Edit]

The current Ghostscript PDF interpreter is written in PostScript. If you look in /ghostpdl/Resource/Init/pdf_draw.ps you will see :

/FileAttachment {mark exch loadannot /ANN pdfmark  false} bdef

That's where FileAttachment annotations are processed. As you can see that uses a function called loadannot to convert the annotation dictionary into a series of strings which are stored on the operand stack, and then adds the /Ann and calls pdfmark to process the strings.

You can find the pdfmark operator documented in the Adobe pdfmark reference (available somewhere on the Adobe web site, I recommend Google, they keep moving it).

Here's what the original file looks like, you need to create pdfmarks to reproduce this:

23 0 obj
<<
  /AP <<
    /N 26 0 R
  >>
  /C [ 0.25 0.333328009 1 ]
  /Contents (1.pdf)
  /CreationDate (D:20180402114155+05'30')
  /F 28
  /FS 24 0 R                              
  /NM (55b56d89-a71e-484c-bf64-e4608540304b)
  /Name /Paperclip

  /RC (<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:11.0.21" xfa:spec="2.0.2" ><p>1.pdf</p></body>)
  /Rect [ 200.852997 727.552979 207.852997 744.552979 ]
  /Subj (File Attachment)
  /Subtype /FileAttachment
  /T (ybn9mk)
  /Type /Annot
>>
endobj

24 0 obj
<<
  /EF <<
    /F 25 0 R
  >>
  /F (1.pdf)
  /Type /Filespec
  /UF (1.pdf)
>>
endobj

25 0 obj
<<
  /DL 82637
  /Subtype /application#2Fpdf
  /Length 82637
  /Params <<
    /CheckSum <EC9AED504CB6442F260E1379E21A0873>
    /CreationDate (D:20180402114059+05'30')
    /ModDate (D:20170907123559+05'30')
    /Size 82637
  >>
>>
stream
%PDF-1.5
.....
.... embedded PDF file here
....
....
endstream
endobj

The current Ghostscript implementation will reproduce object 23, the FileAttachement annotation, and it will correctly expand the /EF dictionary inline into that annotation. However, it doesn't write object 25, the actual embedded PDF file.

So you would need to add code to read the Embedded File object, write that as a named content stream, using pdfmark, and then reference that named object stream from the /EF key in the FileSpec dictionary (object 24 in the original file, but expanded and included inline in the pdfwrite output).

Unless you are very familiar with PostScript, this will be quite a challenge.