8
votes

I'm using iTextSharp to generate pdf-a documents from images. So far I've not been successful.
Edit: I'm using iTextSharp to generate the PDF

All I try is to make a pdf-a document (1a or 1b, whatever suits), with some images. This is the code I've come up so far, but I keep getting errors when I try to validate them with pdf-tools or validatepdfa.

This are the errors I get from pdf-tools (using PDF/A-1b validation): Edit: MarkInfo and Color Space arn't yet working. The rest is okay

Validating file "0.pdf" for conformance level pdfa-1a
The key MarkInfo is required but missing.
A device-specific color space (DeviceRGB) without an appropriate output intent is used.
The document does not conform to the requested standard.
The document contains device-specific color spaces.
The document doesn't provide appropriate logical structure information.
Done.

Main flow

var output = new MemoryStream();
using (var iccProfileStream = new FileStream("ToPdfConverter/ColorProfiles/sRGB_v4_ICC_preference_displayclass.icc", FileMode.Open))
{
    var document = new Document(new Rectangle(PageSize.A4.Width, PageSize.A4.Height), 0f, 0f, 0f, 0f);
    var pdfWriter = PdfWriter.GetInstance(document, output);
    pdfWriter.PDFXConformance = PdfWriter.PDFA1A;
    document.Open();

    var pdfDictionary = new PdfDictionary(PdfName.OUTPUTINTENT);
    pdfDictionary.Put(PdfName.OUTPUTCONDITION, new PdfString("sRGB IEC61966-2.1"));
    pdfDictionary.Put(PdfName.INFO, new PdfString("sRGB IEC61966-2.1"));
    pdfDictionary.Put(PdfName.S, PdfName.GTS_PDFA1);

    var iccProfile = ICC_Profile.GetInstance(iccProfileStream);
    var pdfIccBased = new PdfICCBased(iccProfile);
    pdfIccBased.Remove(PdfName.ALTERNATE);
    pdfDictionary.Put(PdfName.DESTOUTPUTPROFILE, pdfWriter.AddToBody(pdfIccBased).IndirectReference);

    pdfWriter.ExtraCatalog.Put(PdfName.OUTPUTINTENT, new PdfArray(pdfDictionary));

    var image = PrepareImage(imageBytes);

    document.Open();
    document.Add(image);

    pdfWriter.CreateXmpMetadata();

    pdfWriter.CloseStream = false;
    document.Close();
}
return output.GetBuffer();

This is prepareImage()
It's used to flatten the image to bmp, so I don't need to bother about alpha channels.

private Image PrepareImage(Stream stream)
{
    Bitmap bmp = new Bitmap(System.Drawing.Image.FromStream(stream));
    var file = new MemoryStream();
    bmp.Save(file, ImageFormat.Bmp);
    var image = Image.GetInstance(file.GetBuffer());

    if (image.Height > PageSize.A4.Height || image.Width > PageSize.A4.Width)
    {
        image.ScaleToFit(PageSize.A4.Width, PageSize.A4.Height);
    }
    return image;
}

Can anyone help me into a direction to fix the errors? Specifically the device-specific color spaces

Edit: More explanation: What I'm trying to achieve is, converting scanned images to PDF/A for long-term data storage

Edit: added some files I'm using to test with
PDFs and Pictures.rar (3.9 MB)
https://mega.co.nz/#!n8pClYgL!NJOJqSO3EuVrqLVyh3c43yW-u_U35NqeB0svc6giaSQ

2
It might be worth raising a bug with the iText people.Rup
Why do you set conformance level to PDF/A-1a and then check against 1b? It would be good to be consistent. Also, why do you open the document twice? Also, I would try to resolve the other errors first - the errors you have with file structure being corrupted and so on, could easily interfere with the (lesser) problem you have with color spaces...David van Driessche
@David Okay, thanks for your reply. Though I've got already almost everything correctly working now. Only the color space isn't correct. I've added some edits to the code.Highmastdon
What's the color space of the image you are inserting? And could you share an example PDF? That way I could run it through the pdfToolbox PDF/A verification and perhaps have a more descriptive error message.David van Driessche
What we're trying to do is convert scanned images to PDF/A for long-term data storage. I've uploaded a zip with the files I'm using for testing: PDFs and Pictures.rar (3.9 MB) mega.co.nz/…Highmastdon

2 Answers

1
votes

OK, I checked one of your files in callas pdfToolbox and it says: "Device color space used but no PDF/A output intent". Which I took as a sign that you do something wrong while writing an output intent to the document. I then converted that document to PDF/A-1b with the same tool and the difference is obvious.

Perhaps there are other errors you need to fix, but the first error here is that you put a key in the catalog dict for the PDF file that is named "OutputIntent". That's wrong: page 75 of the PDF Specification states that the key should be named "OutputIntents".

Like I said, perhaps there are other problems with your file beyond this, but the wrong name for the key causes PDF/A validators not to find the Output Intent you try to put in the file...

0
votes
  1. First of all, pdfx IS NOT pdfa.

    1. Second, you're using wrong PdfWriter. It should be PdfAWriter.

I do not have solution for image problem unfortunatelly, but I have for 1 and 2.

Regards

using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Text;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.html.simpleparser;
using iTextSharp.tool.xml;
using System.Drawing;
using System.Drawing.Imaging;

namespace Tests
{
    /*
     * References:  
     * UTF-8 encoding http://stackoverflow.com/questions/4902033/itextsharp-5-polish-character
     * PDFA http://www.codeproject.com/Questions/661704/Create-pdf-A-using-itextsharp
     * Images http://stackoverflow.com/questions/15896581/make-a-pdf-conforming-pdf-a-with-only-images-using-itextsharp
     */

    [TestClass]
    public class UnitTest1
    {
        /*
         * IMPORTANT: Restrictions with html usage of tags and attributes
         * 1. Dont use * <head> <title>Sklep</title> </head>, because title is rendered to the page
         */

        // Test cases
        static string contents = "<html><body style=\"font-family:arial unicode ms;font-size: 8px;\"><p style=\"text-align: center;\"> Davčna številka dolžnika: 74605968<br /> </p><table> <tr> <td><b>\u0160t. sklepa: 88711501</b></td> <td style=\"text-align: right;\">Davčna številka dolžnika: 74605968</td> </tr> </table> <br/><img src=\"http://img.rtvslo.si/_static/images/rtvslo_mmc_logo.png\" /></body></html>";
        //static string contents = "<html><body style=\"font-family:arial unicode ms;font-size: 8px;\"><p style=\"text-align: center;\"> Davčna številka dolžnika: 74605968<br /> </p><table> <tr> <td><b>\u0160t. sklepa: 88711501</b></td> <td style=\"text-align: right;\">Davčna številka dolžnika: 74605968</td> </tr> </table> <br/></body></html>";

        //[TestMethod]
        public void CreatePdfHtml()
        {
            createPDF(contents, true);        
        }

        private void createPDF(string html, bool isPdfa)
        {
            TextReader reader = new StringReader(html);
            Document document = new Document(PageSize.A4, 30, 30, 30, 30);
            HTMLWorker worker = new HTMLWorker(document);

            PdfWriter writer;
            if (isPdfa)
            {
                //set conformity level
                writer = PdfAWriter.GetInstance(document, new FileStream(@"c:\temp\testA.pdf", FileMode.Create), PdfAConformanceLevel.PDF_A_1B);

                //set pdf version
                writer.SetPdfVersion(PdfAWriter.PDF_VERSION_1_4);

                // Create XMP metadata. It's a PDF/A requirement.
                writer.CreateXmpMetadata();
            }
            else
            {
                writer = PdfWriter.GetInstance(document, new FileStream(@"c:\temp\test.pdf", FileMode.Create));
            }

            document.Open();

            if (isPdfa) // document should be opend, or it will fail
            {
                // Set output intent for uncalibrated color space. PDF/A requirement.
                ICC_Profile icc = ICC_Profile.GetInstance(Environment.GetEnvironmentVariable("SystemRoot") +  @"\System32\spool\drivers\color\sRGB Color Space Profile.icm");
                writer.SetOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);
            }

            //register font used in html
            FontFactory.Register(Environment.GetEnvironmentVariable("SystemRoot") + "\\Fonts\\ARIALUNI.TTF", "arial unicode ms");

            //adding custom style attributes to html specific tasks. Can be used instead of css
            //this one is a must fopr display of utf8 language specific characters (čćžđpš)
            iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet();
            ST.LoadTagStyle("body", "encoding", "Identity-H");
            worker.SetStyleSheet(ST);

            worker.StartDocument();
            worker.Parse(reader);
            worker.EndDocument();
            worker.Close();
            document.Close();
        }

    }


}