71
votes

I have used ?unzip in the past to get at contents of a zipped file using R. This time around, I am having a hard time extracting the files from a .gz file which can be found here.

I have tried ?gzfile and ?gzcon but have not been able to get it to work. Any help you can provide will be greatly appreciated.

5

5 Answers

68
votes

Here is a worked example that may help illustrate what gzfile() and gzcon() are for

foo <- data.frame(a=LETTERS[1:3], b=rnorm(3))
foo
#  a        b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776
write.table(foo, file="/tmp/foo.csv")
system("gzip /tmp/foo.csv")             # being very explicit

Now that the file is written, instead of implicit use of file(), use gzfile():

read.table(gzfile("/tmp/foo.csv.gz"))   
#  a        b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776

The file you point is a compressed tar archive, and as far as I know, R itself has no interface to tar archives. These are commonly used to distribute source code--as for example for R packages and R sources.

50
votes

To un-gz a file in R you can do

library(R.utils)
gunzip("file.gz", remove=FALSE)

or

gunzip("file.gz")

But then you get the default (remove=TRUE) behavior in which the input file is removed after that the output file is fully created and closed.

37
votes

If you really want to uncompress the file, just use the untar function which does support gzip. E.g.:

untar('chadwick-0.5.3.tar.gz')
26
votes

http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html

R added transparent decompression for certain kinds of compressed files in the latest version (2.10). If you have your files compressed with bzip2, xvz, or gzip they can be read into R as if they are plain text files. You should have the proper filename extensions.

The command...

myData <- read.table('myFile.gz')  

#gzip compressed files have a "gz" extension

Will work just as if 'myFile.gz' were the raw text file.

1
votes
library(vroom)
columns3 = c('A', 'B',...) ## define column names
Data1<- vroom(".../XXX.tsv",col_names = columns3)

works fine with tsv.gz