2
votes

I'm working with a data frame approximately 50K rows by 20 columns. It's about 10 MB in size as a tab delimited text file. I run some moderate analysis on it utilizing R scripts (.R) and everything is working fine. I started writing the report in R Markdown to package everything in a nice neat document.

When I run the rmarkdown::render("my_document.Rmd") command a nice HTML is rendered. Occasionally in Windows10 I get this Error: cannot allocate vector of size 15.4 Gb, or X Gb. If I push Up arrow on my keyboard, and run the command immediately again everything works fine and life goes on. I don't know why the error is intermittent, but at least things work again on the next try of the same command. I also find it hard to believe an object of size 15.4 Gb is generated from my original 10 MB file, which I only run basic commands on such as dplyr::filter() or dplyr::group_by() %>% summarise(n()) >%> etc., I don't add rows or columns to the original data frame.

If I do this same procedure in Ubuntu Linux I never get the error, and my session totally freezes instead. Ctrl+Alt+F1 doesn't even work and I'm forced to hard reset. That's more of a headache as it happens about once or twice an hour. I've left the System Monitor open and noticed the rsession consumed memory goes from 200 MB to 7 GB (basically all of my 8 Gb of RAM) when this situation occurs.

What road do I go down? Is this an R issue? R Studio? Rmarkdown package? knitR package? Pandoc issue? I just updated R (3.4.3), R Studio (1.1.423), all my R packages, and the issue is still occurring. I don't really expect an answer, but am hoping for guidance on where to even start. I'd take a band-aid fix if the root cause determination is unlikely. Seems it may be.

[EDIT] I've added a redacted version of my .Rmd file. But it basically includes everything, and yes my actual .Rmd file is not much longer than what's seen below. I'll also paste the error message if that helps. You'll notice I've added white space several times, that's why you see the div style arguments.

---
title: "February Reporting"
date: "Dept XYZ [February 20, 2018]"
output: html_document
---

<div style="margin-bottom:35px;">
</div>

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Introduction

This report meets the requirements of *XYZ123, Body of Standards* at `Revision: 12` `Version: G`. In particular we are concerned with:

* ***2.5 Report Out***
    + *2.5.1* On a periodic basis...
        - *2.5.1.1* The weekly output...
        - *2.5.1.2* The weekly output...
        - *2.5.1.3* The weekly output...
    + *2.5.2* On a periodic basis...
        - *2.5.2.1* The annual output...
        - *2.5.2.2* The annual output...
        - *2.5.2.3* The annual output...
    + *2.5.3* However, etc.
etc.

<div style="margin-bottom:35px;">
</div>

# Data Sources - \[Verified with XYZ]
***
A basic analysis is performed but is not indicative of...
The analysis often utilizes the following variable, represented by:
$$ \text{The Eqation is} = \frac{\text{Multi word variable}}{\text{multi word variable}} $$

That's about it for my R Markdown file. And the error is here:

rmarkdown::render("mark_output.Rmd")

processing file: mark_output.Rmd |.........
| 14% ordinary text without R code

|................... | 29% label: setup (with options) List of 1 $ include: logi FALSE

|............................ | 43% ordinary text without R code

|..................................... | 57% label: cars |..............................................
| 71% ordinary text without R code

|........................................................ | 86% label: pressure (with options) List of 3 $ fig.height: num 7 $ fig.width : language 12 + (4/9) $ echo : logi FALSE

|.................................................................| 100% ordinary text without R code

output file: mark_output.knit.md

"C:/Users/stackinator/AppData/Local/Pandoc/pandoc" +RTS -K512m -RTS mark_output.utf8.md --to html --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash+smart --output mark_output.html --email-obfuscation none --self-contained --standalone --section-divs --template "C:\Users\stackinator\Documents\Rlibs\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable "theme:bootstrap" --include-in-header "C:\Users\stackinator\AppData\Local\Temp\RtmpGi2lRl\rmarkdown-str460774e669e.html" --mathjax --variable "mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"

Output created: mark_output.html Error: cannot allocate vector of size 15.4 Gb

1
Can you show the minimum Rmd-code, which shows the error?Christoph
Unless you provide a reproducible example, there isn't much anything can do.nicola
minimum Rmd-code now included, along with the error message.stackinator
If you run gc() before doing the above does it solve it?G. Grothendieck
I noticed I can run any moderate analysis that gets my rsession up to ~200 MB. Then I run rmarkdown::render("my_document.Rmd") maybe 15-20 times, consecutively, and the system will freeze. And I've also tried gc(); rmarkdown::render("my_document.Rmd") and unfortunately that doesn't help.stackinator

1 Answers

1
votes

Head on over and check out this stack overflow answer

In a nutshell the post explains that in a loop situation,

Adding knitr::knit_meta(class=NULL, clean = TRUE) before rmarkdown::render(input=file, etc) seems to do the trick.