0
votes

I'm use html2pdf lib, for creating pdf file from html. And i'm trying to disable users to do copy on pdf's content

ConverterProperties converterProperties = new ConverterProperties();
try {
    HtmlConverter.convertToPdf(html, new FileOutputStream(fileName), converterProperties);
} catch (IOException e) {
    e.printStackTrace();
} 

Where html it's thymeleaf template. After this, i'm get in root of project PDF file. But i need to disable selection of text of this PDF (like it was created from image) How it posible to disable text in pdf and for example in second layer of pdf file put invisable text

1
Is the final goal to not allow users to copy the text and not being able to select text only a means of achieving that goal? Because you can configure document permission flags then so that users are able to select the text but not able to copy it into the clipboardAlexey Subach
@AlexeySubach i'm planing to pass this pdf files to some parsing libs, and want to add invisable layer of text, for better parsing. If library like sovren parse files created by this way, result is best. I'm implement this by creating pdf file from html,-> image from pdf, -> pdf from image. But it's bad way( Result should be something like this: [docdro.id/qphnWF8]Betsko Roman

1 Answers

0
votes

To disallow copying PDF contents by the users viewing the PDF, you can encrypt the PDF with owner password (and without user password), and set corresponding permission flags that will disallow copying the content. Text selection will still be possible in this case but no content will end up in the buffer.

PdfWriter pdfWriter = new PdfWriter("C:/out.pdf", new WriterProperties().setStandardEncryption(
        "".getBytes(), "ownerPass".getBytes(), EncryptionConstants.ALLOW_SCREENREADERS, EncryptionConstants.ENCRYPTION_AES_256));
PdfDocument pdfDocument = new PdfDocument(pdfWriter);
HtmlConverter.convertToPdf(new FileInputStream("C:/in.html"), pdfDocument);
pdfDocument.close();

If the tool you are using for parsing the text from the resultant PDF respect the permission configuration we have set up previously then you may have problems with extracting text from that PDF in the default mode (user mode). In that case you can pass owner password that you have set up previously to the tool so that it knows you are the owner of the document and you are allowed to extract text from it. Alternatively, if the tool does not provide such capabilities, you can decrypt the PDF and turn it into the plain PDF without any restrictions right before passing that PDF into the parsing tool. Here is the code that decrypts the PDF:

PdfDocument pdfDocument = new PdfDocument(new PdfReader("C:/out.pdf",
        new ReaderProperties().setPassword("ownerPass".getBytes())),
        new PdfWriter("C:/decrypted.pdf"));
pdfDocument.close();