Encoding within chunks results in missing characters despite UTF-8

Question

I have an Rmd file encoded with UTF-8, but when I knit the file, R evaluated inline and chunk contents are missing some Czech characters. Everything is fine when I type the text outside of chunks. Reading the same text from a file, I can correctly produce the output inline, but not when using printing (print or cat) from within a chunk. I am completely confused about the situation, especially the cat behaviour.

I am on Windows. Checking encoding in console returns UTF-8. Locale set to English_United Kingdom.1252.

---
title: "test"
output: html_document
---
```{r}
txt <- "Čeština funguje"
print(Encoding(txt))
print(txt)      # prints incorrectly
```

Čeština funguje # prints correctly
`r txt`         # prints incorrectly

```{r}
cat(txt)        # prints incorrectly
```

```{r, results='asis'}
line <- readLines("line", encoding = "UTF-8")
print(Encoding(line))
print(line) # prints incorrectly
cat(line)   # prints incorrectly
```

`r line`    # prints correctly!

P.S. I know there has been a lot said about R and encoding on Windows, but despite my extensive searching I can't find a solution and don't fully understand this behaviour. I am guessing I need to set some locale, but my efforts so far have been in vain.

Yihui Xie Yihui Xie · Accepted Answer · 2020-12-21T14:53:11

Before R supports UTF-8 natively on Windows, usually you have to set the locale to the specific language if you want to use multi-byte characters from this language, e.g., you need to use the Czech locale instead of English if you want to properly print()/cat() Czech characters. The locale needs to be set before knitting happens, e.g., you may set it in your ~/.Rprofile:

Sys.setlocale(, 'Czech')

I have never used Czech before and am not sure if 'Czech' is a proper value, but that's the idea (I have had success with other languages before).

Encoding within chunks results in missing characters despite UTF-8

1 Answers