20
votes

I type a report with Rmarkdown in Rstudio. When converting it in html with knitr, there is also a markdown file produced by knitr. I convert this file with pandoc as follows :

pandoc -f markdown -t docx input.md -o output.docx

The output.docx file is nice except for one problem: the sizes of the figures are altered, I need to manually resize the figures in Word. Is there something to do, maybe an option with pandoc, to get the right figures sizes ?

4
Which version of Pandoc are you using? If using an outdated version, a possible workaround would be to render smaller images inside of knitr.daroczig
This is version 1.9.4.2. I don't want to change the sizes inside of knitr because the sizes are well in the output html file.Stéphane Laurent
I have tried the latest (Windows) Pandoc version now. That does not change anything.Stéphane Laurent
I would love to find an answer to this question...Tal Galili
@TalGalili Please see my solution using ImageMagick.Stéphane Laurent

4 Answers

8
votes

An easy way consists in including a scale factor k in the individual chunk options:

{r, fig.width=8*k, fig.height=6*k}

and a variable dpi in the global chunk options:

opts_chunk$set(dpi = dpi)

Then you can set the values of dpi and k before knitting the Rmd file in the global environment:

dpi <<- 96    
k <<- 1

or you can set them in a chunk in the Rmd file (set k in the first chunk for example).

3
votes

Here is a solution to resize the figures using ImageMagick from an R Script. The 70% ratio seems to be a nice choice.

# the path containing the Rmd file :
wd <- "..."
setwd(wd)

# the folder containing the figures :
fig.path <- paste0(wd, "/figure")
# all png figures :
figures <- list.files(fig.path, pattern=".png", all.files=TRUE)

# (safety) create copies of the original files
dir.create(paste0(fig.path,"_copy"))
for(i in 1:length(figures)){
  fig <- paste0(fig.path, "/", figures[i])
  file.copy(fig,"figure_copy")
}

# resize all figures
for(i in 1:length(figures)){
    fig <- paste0(fig.path, "/", figures[i])
    comm <- paste("convert -resize 70%", fig, fig)
    shell(comm)
}

# then run pandoc from a command line  
# or from the pandoc() function :
library(knitr)
pandoc("MyReport.md", "docx")

More info about the resize function of ImageMagick : www.perturb.org

3
votes

I also want to transform an R markdown into both an html and a .docx/.odt with figures at the good size and resolution. Until now, I found that the best way to do this is define explicitly the resolution and size of the graphs in the .md document (dpi, fig.width and fig.height options). If you do this you have good graphs usable for publication and the odt/docx is ok. The problem if you use dpi much higher than the default 72 dpi, is that the graphs will look too big in the html file. Here are 3 approaches I have used to handle this (NB I use R scripts with spin() syntax):

1) use out.extra ='WIDTH="75%"' in knitr options. This will force all graphs of the html to occupy 75% of the window width. This is a quick solution but not optimal if you have plots with very different sizes. (NB I prefer working with centimetres rather than inches, hence the /2.54 everywhere)

library(knitr)
opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), dpi = 400,
               fig.width = 8/2.54, fig.height = 8/2.54,
               out.extra ='WIDTH="75%"'
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

2) use out.width and out.height to specify the size of the graphs in pixels into the html file. I use a constant "sc" to scale down the size of the plot into the html output. This is the more precise approach but the problem is that for each graph you have to define both fig.witdth/height and out.width/height and this is really boaring ! Ideally you should be able to specify in the global options that e.g. out.width = 150*fig.width (where fig.width changes from chunk to chunk). Maybe something like that is possible but I don't know how.

#+ echo = FALSE
library(knitr)
sc <- 150
opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), dpi = 400,
                fig.width = 8/2.54, fig.height = 8/2.54,
                out.width = sc*8/2.54, out.height = sc*8/2.54
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54, out.width= sc * 14/2.54, out.height= sc * 10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

Note that for these two solution, I think that you can't transform directly your md file into odt with pandoc (the figures are not included). I transform the md into html and then the html into odt (didn't tried for docx). Something like that (if the previous R scripts is names "figsize1.R") :

library(knitr)
setwd("/home/gilles/")
spin("figsize1.R")

system("pandoc figsize1.md -o figsize1.html")
system("pandoc figsize1.html -o figsize1.odt")

3) Simply compile your document twice, once with low dpi value (~96) for the html output and once with high resolution (~300) for the odt/docx output. This is my preferred way now. The main disadvantage is that you must compile twice but this is not reallya problem to me since I generally need the odt file only at the very end of the job to provide to end users. I compile regularly the html during the work with the html notebook button in Rstudio.

#+ echo = FALSE
library(knitr)

opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), 
               fig.width = 8/2.54, fig.height = 8/2.54
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

Then compile the 2 outputs with the following script (NB here you can directly transform the md file into html):

library(knitr)
setwd("/home/gilles")

opts_chunk$set(dpi=96)
spin("figsize3.R", knit=FALSE)
knit2html("figsize3.Rmd")

opts_chunk$set(dpi=400)
spin("figsize3.R")
system("pandoc figsize3.md -o figsize3.odt")
2
votes

Here is my solution: hack the docx converted by Pandoc, as docx is simply a bundle of xml files and adjusting the figure sizes is pretty straightforward.

The following is what a figure looks like in the word/document.xml extracted from a converted docx:

<w:p>
  <w:r>
    <w:drawing>
      <wp:inline>
        <wp:extent cx="1524000" cy="1524000" />
        ...
        <a:graphic>
          <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
            <pic:pic>
              ...
              <pic:blipFill>
                <a:blip r:embed="rId23" />
                ...
              </pic:blipFill>
              <pic:spPr bwMode="auto">
                <a:xfrm>
                  <a:off x="0" y="0" />
                  <a:ext cx="1524000" cy="1524000" />
                </a:xfrm>
                ...
              </pic:spPr>
            </pic:pic>
          </a:graphicData>
        </a:graphic>
      </wp:inline>
    </w:drawing>
  </w:r>
</w:p>

So substituting the cx & cy attributes of the nodes wp:extent & a:ext with desired value would do the resizing job. The following R code works for me. The widest figure would take up a whole line's width specified by the variable out.width, and the rest are proportionally resized.

require(XML)

## default linewidth (inch) for Word 2003
out.width <- 5.77
docx.file <- "report.docx"

## unzip the docx converted by Pandoc
system(paste("unzip", docx.file, "-d temp_dir"))
document.xml <- "temp_dir/word/document.xml"
doc <- xmlParse(document.xml)
wp.extent <- getNodeSet(xmlRoot(doc), "//wp:extent")
a.blip <- getNodeSet(xmlRoot(doc), "//a:blip")
a.ext <- getNodeSet(xmlRoot(doc), "//a:ext")

figid <- sapply(a.blip, xmlGetAttr, "r:embed")
figname <- dir("temp_dir/word/media/")
stopifnot(length(figid) == length(figname))
pdffig <- paste("temp_dir/word/media/",
                ## in case figure ids in docx are not in dir'ed order
                sort(figname)[match(figid, substr(figname, 1, nchar(figname) - 4))], sep="")

## get dimension info of included pdf figures
pdfsize <- do.call(rbind, lapply(pdffig, function (x) {
    fig.ext <- substr(x, nchar(x) - 2, nchar(x))
    pp <- pipe(paste(ifelse(fig.ext == 'pdf', "pdfinfo", "file"), x, sep=" "))
    pdfinfo <- readLines(pp); close(pp)
    sizestr <- unlist(regmatches(pdfinfo, gregexpr("[[:digit:].]+ X [[:digit:].]+", pdfinfo, ignore.case=T)))
    as.numeric(strsplit(sizestr, split=" x ")[[1]])
}))

## resizing pdf figures in xml DOM, with the widest figure taking up a line's width
wp.cx <- round(out.width*914400*pdfsize[,1]/max(pdfsize[,1]))
wp.cy <- round(wp.cx*pdfsize[, 2]/pdfsize[, 1])
wp.cx <- as.character(wp.cx)
wp.cy <- as.character(wp.cy)
sapply(1:length(wp.extent), function (i)
       xmlAttrs(wp.extent[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i]));
sapply(1:length(a.ext), function (i)
       xmlAttrs(a.ext[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i]));

## save hacked xml back to docx
saveXML(doc, document.xml, indent = F)
setwd("temp_dir")
system(paste("zip -r ../", docx.file, " *", sep=""))
setwd("..")
system("rm -fr temp_dir")