1
votes

I am using puppeteer (headless Chrome) to render an .html generated by an RMarkdown script to .pdf. However, puppeteer doesn't seem to respect my color and background-color style settings. The problem doesn't exist when rendering pages off the web, suggesting it's the interaction between puppeteer and RMarkdown.

Consider the following test.Rmd script:

---
title: "Testing colors"
output: html_document
---

<style>
html {
  -webkit-print-color-adjust: exact;
}

h4 {color: blue;}
</style>

#### Blue heading
<div style="color:red">This text is red</div>
<div style="background-color:red">This text has red background</div>

We can render it to test.html by calling rmarkdown::render( "test.Rmd", output_file="test.html" ) in R. Note the -webkit-print-color-adjust setting; it is often recommended as a solution to color-related problems, but I found that it has no effect in my case.

Following puppeteer tutorials, I put together the following render.js:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.setViewport({width: 400, height: 200});
    await page.goto('file:///home/sokolov/test/puppeteer/test.html');
    await page.screenshot({path: 'test.png'});
    await page.pdf({path: 'test.pdf', printBackground: true});
    await browser.close();
})();

Running node render.js from the command line produces test.png and test.pdf. The former looks exactly how you would expect:

enter image description here

However, the .pdf loses all color specifications:

enter image description here

If I replace the url in my render.js with an external page (e.g., https://www.w3schools.com/css/css_background.asp), the page renders correctly in both .png and .pdf formats. Specifying printBackground: true was key to making it work for this external page, but it seems to have no effect of my local test.html.

Any thoughts on how to get the colors working?

P.S. To briefly address the question of why I'm not simply using output: pdf_document in my .Rmd, I wanted to note that the real RMarkdown document I'm working with uses flexdashboard layout, which doesn't play nicely with knitr. Most of the tutorials I've read suggest using a headless browser to render the final .html to .png/.pdf. The solution is working well for me, except for the loss of color styles.

1

1 Answers

1
votes

The Simplest Solution

Why not just try removing all !important tags? We're targeting the css embedded by rmarkdown:render. Just run some search and replace code, and puppeteer will correctly colorize a pdf made from the test.html. Run this in a shell with vim installed:

echo "%s/%21important// | w!" | vim -e test.html

And that's it! Below I just document my first attempt to solve this problem, and I explain why it might not be the best solution. Someone else may find it useful.

My first attempt

Run this in a shell with vim installed:

echo "%s/%40media%20print%7B.\{-}%7D// | w!" | vim -e test.html

The above command overwrites test.html with the @media print{} styles partially removed. Though the @media print{} styles are not fully or cleanly removed, the new test.html has the desired effect.

Here is what I'm doing

We're working on url-encoded css, so %s/%40media%20print%7B.\{-}%7D// is needed when I really wish to write this:

%s/@media print{.\{-}}//

The goal is to delete statements like this: @media print {a: ""} entirely. I don't properly handle nested brackets, so this script only partially delete statements like this @media print {.a{a: ""};b{}...} leaving everything after the first closing bracket untouched. That is a bug. I'm removing too little due to non-greedy matching with .\{-} instead of greedy matching with .*, which may remove too much.

Here's what I'm removing

I've URL decoded the actual css in test.html for readability. You can see that I've deleted non-matching braces. Likely, the offending css is not actually deleted, but removing these lines sufficiently breaks the css to disable the offending css whenever @media print is detected. Anyway, removing these 6 matches solves the problem.

@media print{*,:after,:before{color:#000!important;text- 
shadow:none!important;background:0 0!important;-webkit-box- 
shadow:none!important;box-shadow:none!important}

@media print{.visible-print{display:block!important}

@media print{.visible-print-block{display:block!important}

@media print{.visible-print-inline{display:inline!important}

@media print{.visible-print-inline-block{display:inline-block!important}

@media print{.hidden-print{display:none!important}