This was a little tricky to track down until I read the body of readRDS(). What it seems you need to do is
- Open a connection to the
.zip archive and the file inside it with unz()
- Apply GZIP decompression to this connection using
gzcon()
- And finally pass this decompressed connection to
readRDS().
Here's an example to illustrate using the following serialised matrix mat inside a zip archive matrix.zip
mat <- matrix(1:9, ncol = 3)
saveRDS(mat, "matrix.rds")
zip("matrix.zip", "matrix.rds")
Open a connection to matrix.zip
con <- unz("matrix.zip", filename = "matrix.rds")
Now, using gzcon(), apply GZIP decompression to this connection
con2 <- gzcon(con)
Finally, read from the connection
mat2 <- readRDS(con2)
In full we have
con <- unz("matrix.zip", filename = "matrix.rds")
con2 <- gzcon(con)
mat2 <- readRDS(con2)
close(con2)
This gives
> con <- unz("matrix.zip", filename = "matrix.rds")
> con2 <- gzcon(con)
> mat2 <- readRDS(con2)
> close(con2)
> mat2
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> all.equal(mat, mat2)
[1] TRUE
Why?
Why you have to go through this convoluted extra step is (I think) described in ?readRDS:
Compression is handled by the connection opened when file is a
file name, so is only possible when file is a connection if
handled by the connection. So e.g. url connections will need to
be wrapped in a call to gzcon.
And if you look at the internals of readRDS() we see:
> readRDS
function (file, refhook = NULL)
{
if (is.character(file)) {
con <- gzfile(file, "rb")
on.exit(close(con))
}
else if (inherits(file, "connection"))
con <- file
else stop("bad 'file' argument")
.Internal(unserializeFromConn(con, refhook))
}
<bytecode: 0x2841998>
<environment: namespace:base>
If file is a character string for the file name, the object is decompressed using gzile() to create the connection to the .rds we want to read. Notice that if you pass a connection as file, as you want to do, at no point has R decompressed the connection. file is just assigned to con and then passed to the internal function unserializeFromConn. Hence wrapping gzcon() around the connection created by unz works.
Basically, when unserializeFromConn reads from a connection it expects it to be decompressed but that decompression only happen automagically when you pass readRDS() a filename, not a connection.