4
votes

I am trying to develop a rmarkdown report for my data analysis that could be knitted both in word_document and pdf_document. Bookdown works really well for captions and automatic numbering (https://bookdown.org/yihui/bookdown/). The only main issue left is how to do page breaks that could work for both.

For pdf, i use xelatex from tinytex and \newpage works great. For Word, I use section 5 page break and customize the style (incl. page break and white font).

I could use Edit > Find... and Replace All, but as I am still developing the report and need to test frequently that the output looks great in both formats.

Is there any way I could either:

  • do the replace all in a R function,
  • edit the tex template to have section 5 not display in pdf outputs (\newpage in not shown in ms word), or
  • apply a magic command to force a page break compatible with all formats?

Thanks!

Here is a reproducing example of R Markdown file:

---
title: "Untitled"
author: "Me"
date: "November 15, 2018"
output:
  pdf_document: default
  word_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Some text.  

I want a page break after this.

\newpage
##### page break

This should be the first sentence of the new page.

Some more text.
1
Relevant answer to a similar question: stackoverflow.com/a/52131435/2425163. The mentioned Lua filter can be invoked by writing pandoc_args = ['--lua-filter=<PATH_TO_FILTER> below both pdf_document and word_document.tarleb
Thanks! It works well for docx/pdf (I only tested these two formats), including pdf_document2 and word_document2 from bookdown. Before i make it an accepted answer, it noticed it creates an empty line before and after the page break. Any chance i could modify the lua filter to remove the empty lines, at least the one after the page break?David
Also, I didn't include the filter for pdf output since \newpage natively works there.David
The empty lines are produced by the extra paragraph which is inserted to create the line break. It should be harmless, but I can think of a way to get rid of it. I'm going to publish the re-worked code in a more central location and can ping you once it's available.tarleb
Agreed, i was afraid that the empty line plus the space before a header 1 for example would generate too much empty space (for the word output) but it's actually a very minor concern. Thanks a lot for the answer!David

1 Answers

3
votes

Many thanks to tarleb for the answer. As suggested I used your answer to this post: https://stackoverflow.com/a/52131435/2425163.

step 1: create a txt file with the following code:

--- Return a block element causing a page break in the given format.
local function newpage(format)
  if format == 'docx' then
    local pagebreak = '<w:p><w:r><w:br w:type="page"/></w:r></w:p>'
    return pandoc.RawBlock('openxml', pagebreak)
  elseif format:match 'html.*' then
    return pandoc.RawBlock('html', '<div style=""></div>')
  elseif format:match '(la)?tex' then
    return pandoc.RawBlock('tex', '\\newpage{}')
  elseif format:match 'epub' then
    local pagebreak = '<p style="page-break-after: always;"> </p>'
    return pandoc.RawBlock('html', pagebreak)
  else
    -- fall back to insert a form feed character
    return pandoc.Para{pandoc.Str '\f'}
  end
end

-- Filter function called on each RawBlock element.
function RawBlock (el)
  -- check that the block is TeX or LaTeX and contains only \newpage or
  -- \newpage{} if el.format:match '(la)?tex' and content:match
  -- '\\newpage(%{%})?' then
  if el.text:match '\\newpage' then
    -- use format-specific pagebreak marker. FORMAT is set by pandoc to
    -- the targeted output format.
    return newpage(FORMAT)
  end
  -- otherwise, leave the block unchanged
  return nil
end

step 2: save the file as page-break.lua in the same directory with my R Markdown file.

step 3: add the link as pandoc argument.

This the reproducible example (R Markdown file) corrected:

---
title: "Untitled"
author: "Me"
date: "November 15, 2018"
output:
  pdf_document: default
  word_document:
    pandoc_args:
     '--lua-filter=page-break.lua'
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

Some text.  

I want a page break after this.

\newpage

This should be the first sentence of the new page.

Some more text.

Please note that this may not work for the toc, but i don't use the lua filter with pdf and with word _document it's very easy to add the table of content afterwards directly in Word. Plus there is a link to a solution for that problem in the above link.