7
votes

We have a problem using RMarkdown on multiple operating systems.

Initially, an .Rmd file is created on a Linux system (Ubuntu 12.04 LTS) and then pushed to a GitHub repo.

It can be compiled ("knitted") without problems on this system.

It is then pulled on a Windows 7 machine with RStudio installed.

There, when trying to compile, the following error shows up:

Error in yaml::yaml.load(front_matter) : 
  Reader error: invalid leading UTF-8 octet: #FC at 66
Calls: <Anonymous> -> parse_yaml_front_matter -> <Anonymous> -> .Call
Execution halted
  1. When creating another .Rmd file on the Windows system, it works flawlessly.
  2. When creating another .Rmd file on the Windows system, and copying everything but the first few lines of the "problematic" file to the other .Rmd file, and compiling this file, it works flawlessly.

I compared both files in HEX (in Sublime) on both operating systems: They are EXACTLY the same.

Has somebody else seen that error before?

Update: It seems as if a German Umlaut ("ü") is causing the problem, as its UTF-8 "Escaped Unicode" is \uFC, according to http://www.endmemo.com/unicode/unicodeconverter.php

In general, it seems that Unicode is not correctly recognized by either R, RStudio or knitr on Windows. When I type in some Umlauts in a new .Rmd file, and knit it, I get output such as "öää". In RStudio > Tools > Global options, I set the Default text encoding to "UTF-8". And I also did that for R, in the RProfile.site file (options(encoding="UTF-8")).

Update 2: library(rmarkdown); sessionInfo() gives

R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252    LC_MONETARY=German_Switzerland.1252
[4] LC_NUMERIC=C                        LC_TIME=German_Switzerland.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rmarkdown_0.4.2

loaded via a namespace (and not attached):
[1] digest_0.6.8    htmltools_0.2.6 tools_3.1.2    

on Windows 7, whereas, on Ubuntu, it is:

R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rmarkdown_0.3.10

loaded via a namespace (and not attached):
[1] digest_0.6.8    htmltools_0.2.6 tools_3.1.2   

I already suspect the problem to be the diverging locale... how do I fix this?

1
I followed the website linked in your profile and it produces an error in German (Austrian?). I will venture a guess you're using umlatu and other non-latin characters? - Roman Luštrik
True, see my updated question. The website is down in the meantime ;-) - grssnbchr
I feel your pain, I have a similar problem with our local characters (I get into trouble just by signing the document with my name). Due to my rudimentary programming skills I have so far been unsuccessful at deciphering where things break. Even the "raw" R and Rstudio consoles produce different result for some documents. I usually just remove the localized characters and cry myself to sleep. - Roman Luštrik
Do not set options(encoding="UTF-8") unless you really understand its consequences (normally it is a bad idea). It will be nice if you can post library(rmarkdown); sessionInfo() in your post, and update.packages() if possible. A minimal reproducible example is also often the key to diagnose a problem. - Yihui Xie
@Yihui I updated my question, thanks for the instructions. - grssnbchr

1 Answers

1
votes

I am extremely late to this, but I solved the issue by changing the options encoding back to "native":

options(encoding="native")

And changing the default windows encoding to UTF-8 (which opened the pandora box of a non-negligible number of other issues related to the encoding of other programs; so, treat with caution).