35
votes

First of all, I must admit that I'm very new to knitr and the concept of reproducible analysis, but I can see its potential in improving my current workflow (which includes much copy-pasting into word docs).

I often have to produce multiple reports by group (Hospital in this example) and within each hospital, there may be many different Wards that I'm reporting an outcome on. Previously I ran all of my plots and analysis in R using loops, then the copy/pasting work commenced; however, after reading this post (Can Sweave produce many pdfs automatically?), and it gave me hope that I may actually be able to skip many steps and go straight from R to report through Rnw/knitr.

However, after giving it a try I see that there is something that isn't quite working out (as the R environment within the Rnw does not appear to recognize the looping variables I'm trying to pass to it??).

   ##  make my data
Hospital <- c(rep("A", 20), rep("B", 20))
Ward <- rep(c(rep("ICU", 10), rep("Medicine", 10)), 2)
Month <- rep(seq(1:10), 4)
Outcomes <- rnorm(40, 20, 5)
df <- data.frame(Hospital, Ward, Month, Outcomes)


##  Here is my current work flow-- produce all plots, but export as png and cut/paste
for(hosp in unique(df$Hospital)){
  subgroup <- df[ df$Hospital == hosp,]
  for(ward in unique(subgroup$Ward)){
    subgroup2 <- subgroup[subgroup$Ward == ward,]
    savename <- paste(hosp, ward)
    plot(subgroup2$Month, subgroup2$Outcomes, type="o", main=paste("Trend plot for", savename))
  }
}
# followed by much copy/pasting


##  Here is what I'm trying to go for using knitr 
library(knitr)
for (hosp in unique(df$Hospital)){
  knit("C:file.path\\testing_loops.Rnw", output=paste('report_', Hospital, '.tex', sep=""))
}

## With the following *Rnw file
## start *.Rnw Code
\documentclass[10pt]{article}
\usepackage[margin=1.15 in]{geometry}
<<loaddata, echo=FALSE, message=FALSE>>=
  Hospital <- c(rep("A", 20), rep("B", 20))
Ward <- rep(c(rep("ICU", 10), rep("Medicine", 10)), 2)
Month <- rep(seq(1:10), 4)
Outcomes <- rnorm(40, 20, 5)
df <- data.frame(Hospital, Ward, Month, Outcomes)
subgroup <- df[ df$Hospital == hosp,]
@

\begin{document}
<<setup, echo=FALSE >>=
  opts_chunk$set(fig.path = paste("test", hosp , sep=""))
@

Some infomative text about hospital \Sexpr{hosp}

<<plots, echo=FALSE >>=
  for(ward in unique(subgroup$Ward)){
    subgroup2 <- subgroup[subgroup$Ward == ward,]
    #     subgroup2 <- subgroup2[ order(subgroup2$Month),]
    savename <- paste(hosp, ward)
    plot(subgroup2$Month, subgroup2$Outcomes, type="o", main=paste("Trend plot for", savename))
  }
@
\end{document}


##  To be then turned into pdf with this
tools::texi2pdf("C:file.path\\report_A.tex", clean = TRUE, quiet = TRUE)

After trying to run my knit() code chunk I get this error:

Error in file(con, "w") : invalid 'description' argument

And when I look into the directory where the *.tex file was to be created, I can see the 2 pdf plots from hospital A were produced (none for B) and no hospital specific *.tex file to knit into a pdf. Thanks in advance for any help you can offer!

3

3 Answers

16
votes

You don't need to re-define the data in the .Rnw file and I think the warning is coming from the fact that you are putting the output name together with Hospital (the full vector of hospitals) rather than hosp (the loop index).

Following your example, testingloops.Rnw would be

\documentclass[10pt]{article}
\usepackage[margin=1.15 in]{geometry}
<<loaddata, echo=FALSE, message=FALSE>>=
subgroup <- df[ df$Hospital == hosp,]
@

\begin{document}
<<setup, echo=FALSE >>=
  opts_chunk$set(fig.path = paste("test", hosp , sep=""))
@

Some infomative text about hospital \Sexpr{hosp}

<<plots, echo=FALSE >>=
  for(ward in unique(subgroup$Ward)){
    subgroup2 <- subgroup[subgroup$Ward == ward,]
    #     subgroup2 <- subgroup2[ order(subgroup2$Month),]
    savename <- paste(hosp, ward)
    plot(subgroup2$Month, subgroup2$Outcomes, type="o", main=paste("Trend plot for", savename))
  }
@
\end{document}

and the driver R file would be just

##  make my data
Hospital <- c(rep("A", 20), rep("B", 20))
Ward <- rep(c(rep("ICU", 10), rep("Medicine", 10)), 2)
Month <- rep(seq(1:10), 4)
Outcomes <- rnorm(40, 20, 5)
df <- data.frame(Hospital, Ward, Month, Outcomes)

## knitr loop
library("knitr")
for (hosp in unique(df$Hospital)){
  knit2pdf("testingloops.Rnw", output=paste0('report_', hosp, '.tex'))
}
11
votes

Great question! This works for me with the other bits you've supplied in your question. Note that I've replaced your hosp with just x. I've called your Rnw file test.rnw

# input data
Hospital <- c(rep("A", 20), rep("B", 20))
Ward <- rep(c(rep("ICU", 10), rep("Medicine", 10)), 2)
Month <- rep(seq(1:10), 4)
Outcomes <- rnorm(40, 20, 5)
df <- data.frame(Hospital, Ward, Month, Outcomes)

# generate the tex files, one for each hospital in df
library(knitr)
lapply(unique(df$Hospital), function(x) 
       knit("C:\\emacs\\test.rnw", 
            output=paste('report_', x, '.tex', sep="")))

# generate PDFs from the tex files, one for each hospital in df
lapply(unique(df$Hospital), function(x)
       tools::texi2pdf(paste0("C:\\emacs\\", paste0('report_', x, '.tex')), 
                       clean = TRUE, quiet = TRUE))

I've replaced your loops withlapply and anonymous functions, which often seem to be considered more R-ish.

Here you can see where I replaced the hosp with x in the rnw file:

\documentclass[10pt]{article}
\usepackage[margin=1.15 in]{geometry}
<<loaddata, echo=FALSE, message=FALSE>>=
  Hospital <- c(rep("A", 20), rep("B", 20))
Ward <- rep(c(rep("ICU", 10), rep("Medicine", 10)), 2)
Month <- rep(seq(1:10), 4)
Outcomes <- rnorm(40, 20, 5)
df <- data.frame(Hospital, Ward, Month, Outcomes)
subgroup <- df[ df$Hospital == x,]
@

\begin{document}
<<setup, echo=FALSE >>=
  opts_chunk$set(fig.path = paste("test", x , sep=""))
@

Some informative text about hospital \Sexpr{x}

<<plots, echo=FALSE >>=
  for(ward in unique(subgroup$Ward)){
    subgroup2 <- subgroup[subgroup$Ward == ward,]
    #     subgroup2 <- subgroup2[ order(subgroup2$Month),]
    savename <- paste(x, ward)
    plot(subgroup2$Month, subgroup2$Outcomes, type="o", main=paste("Trend plot for", savename))
  }
@
\end{document}

The result is two tex files (report_A.tex, report_B.tex), four PDFs for the figures (A1, A2, B1, B2) and two PDFs for the reports (report_A.pdf, report_B.pdf), each with their figures in them. Is that what you were after?

1
votes

In this answer I intend to answer a more general question: "Using loops to produce multiple pdf reports", and not your specific example. This is because this trend was quite hard to follow as a noob. I managed to get it working eventualy (html version), so this is my humble solution. There are probably some better ones published here, I just can't fully understand them yet.

  1. create RMD file with your design and save it in the working\input directory (in Rstudio: file->newfile->R markdown). This file should include all functions you need to make the plots in the report (simply declare them in one of those code chunks). Think of this file as the template for all future reports. Don't worry about passing the data into it's environment after chewing it up earlier- I will cover that in (2). the key issue to understand is that all calculations are done further down the pipeline (at the moment you render the RMD file).

  2. create the loop you need to use in a diffrent control r file. In my case, there is a loop that iterates over all files in the directory, and gets them into data frame. then I want to pass those dataframes into the RMD, along with other data variables, in order to plot them. This is how its done:

    run_on_all<-function(path_in="path:\\where\\your\\input\\and\\RMD\\is", path_out="path:\\where\\your\\output\\will\\be") setwd(path_in) ibrary(rmarkdown) library(knitr) list_of_file_names=list.files(path = getwd, pattern = "*.csv") #this gets a list of the input files names for (file_name in list_of_file_names) { data=read.csv(file_name) #read file into data frame report_name=paste(some_variable_name,".html",sep="") render("your_template.Rmd",output_file =report_name,output_dir =path_out,list(data,all other parameters you want to input into the RMD))} }

  3. The most important command is the render function call. It allows you to throw into the RMD environment any paramenters you wish. It also allows you to change the name of the report and to change the output location. In addition, by calling it you are also generating the report, so you get it all in one line.(note that if the call to RMD is within a function, you may find that the variables you input are missing, yet the report will still be published correctly)

summary

there are two files you need- RMD file, that will be the template for all additional reports and a control file. the control file gets data, chews it up and pass the chewed parameteres into the RMD (via the render function). the RMD gets the data, does some computations, plots it and publishes it in a new file (also by the render function). I hope I helped.