0
votes

I'm creating an R package with several files in /data. The way one loads data in the R package is to use the system.file(),

system.file(..., package = "base", lib.loc = NULL, mustWork = FALSE)

The file in /data I would like to load into an R data.table has the extension *.txt.gz, my_file.txt.gz. How do I load this into a data.table via read.table() or fread()?

Within the R script, I tried :

#' @import data.table
#' @export
my_function = function(){

    my_table = read.table(system.file("data", "my_file.txt.gz", package = "FusionVizR"), header=TRUE)    

}

This leads to an error via devtools document():

Error in read.table(system.file("data", "my_file.txt.gz", package = "FusionVizR"), header = TRUE) (from script1.R#7) : 
  no lines available in input
In addition: Warning message:
In file(file, "rt") :
  file("") only supports open = "w+" and open = "w+b": using the former

I appear to get the same issue via fread()

#' @import data.table
#' @export
my_function() = function(){

    my_table = fread(system.file("data", "my_file.txt.gz", package = "FusionVizR"), header=TRUE)    

}

This outputs the error:

Input is either empty or fully whitespace after the skip or autostart. Run again with verbose=TRUE.

So, it appears that system.file() doesn't give an object to the file which I could load into an R data.table. How do I do this?

1
Two steps you can do: 1, does system.file find the file? Check via file.exists before doing the read step. 2. if it does find the file, can read.csv cope with it? Test by running read.csv on the command line with the full path to the file. Perhaps you don't have gzip-reading capability in your R version?Spacedman
To help us help you, put your code, or a tiny example that illustrates the problem, on a public site like github. R packages can be fiddly and there are a range of tools people use on them. I'm very surprised that devtools::document() is running code inside your function unless its running tests or examples which you've not shown us.Spacedman
@Spacedman Thank you for the help. I'm running library(devtools) and then document() within the package root directory. As system.file() is documented in Writing R Extensions, this question should be closed as poorly-researched or revised for poor English.ShanZhengYang
Also, this is wrong: my_function() = function(){ - you don't put () in the function name: it should be: my_function = function(){. When I run document on a minimal example I see this warning: "Error in my_function() = function() { (from fnord.R#1) :".Spacedman
@Spacedman Thanks. Edited. Typo on my part.ShanZhengYang

1 Answers

3
votes

Do yourself a HUGE favour and study fread() closely: it is one of the very best features in data.table. I have examples (at work) of reading from a pipe of other commands, of reading compresse data and more.

Here is a simple mock example:

R> write.csv(iris, file="/tmp/demo.csv")
R> system("gzip /tmp/demo.csv")  # to be very plain
R> fread("zcat /tmp/demo.csv.gz")
      V1 Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:   1          5.1         3.5          1.4         0.2    setosa
  2:   2          4.9         3.0          1.4         0.2    setosa
  3:   3          4.7         3.2          1.3         0.2    setosa
  4:   4          4.6         3.1          1.5         0.2    setosa
  5:   5          5.0         3.6          1.4         0.2    setosa
 ---                                                                
146: 146          6.7         3.0          5.2         2.3 virginica
147: 147          6.3         2.5          5.0         1.9 virginica
148: 148          6.5         3.0          5.2         2.0 virginica
149: 149          6.2         3.4          5.4         2.3 virginica
150: 150          5.9         3.0          5.1         1.8 virginica
R> 

Seems in the hast I wrote one column too many (rownames) but you get the idea.

Now, you don't even need fread (but it still more powerful than the alternatives):

R> head(read.csv(file="/tmp/demo.csv.gz"))
  X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 1          5.1         3.5          1.4         0.2  setosa
2 2          4.9         3.0          1.4         0.2  setosa
3 3          4.7         3.2          1.3         0.2  setosa
4 4          4.6         3.1          1.5         0.2  setosa
5 5          5.0         3.6          1.4         0.2  setosa
6 6          5.4         3.9          1.7         0.4  setosa
R> 

R figured out by itself it needed to compress the file.

Edit: I was editing this question earlier when it was deleted under me, which is about as de-motivating as it gets. In a nutshell:

  • system.file() works, e.g. file <- system.file("rawdata", "population.csv", package="gunsales") does contain the complete path as the file exists: "/usr/local/lib/R/site-library/gunsales/rawdata/population.csv". But this is easy to mess up. (Needless to say I do have the package and the file.)
  • look into the data/ directory and what Writing R Extensions says. It is a good mechanism.