Change and save only one row in a file

Question

I wonder if there is something in R that allows me to update a file rather than saving it all the data.

Maybe there is something like sqldf::read.csv.sql for saving.

OK

Let's say I have iris data stored as .csv:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa

But I have realised, that second flower is virginica, so I want change second row to:

2          4.9         3.0          1.4         0.2  virginica

I know, that I can read file, change Species, and save it again, but the more rows (i.e. > 10 ⁶) I have in my file, the less efficient this method is.

Can you narrow down your question? What kind of file are we talking about? For text files, the answer is probably no. — Roman Luštrik
The write functions all have append options. Agree the question is too unclear to support a coding response. You should edit your question to include more specifics. — IRTFM

r2evans r2evans · Accepted Answer · 2018-04-06T22:35:26

In general, R is really not meant for in-place file editing, and I'm aware of no (currently available) tools that support it in any context. Even unixy-tools like sed do fast editing but still do not do it technically "in-place" (even if it hides how it does it). (There might be some that do, but perhaps not with the ease-of-access that you want.)

There is one notable exception, a file format that is designed for in-place editing (well, interaction). It includes significant in-place addition, filtering, replacement, and deletion operators. And for the most part, it generally does this without needing to increase the file size while doing so. It's SQLite.

For example,

library(DBI)
# library(RSQLite) # don't need to load it, just need to have it available
fname <- "./iris.sqlite3"
con <- dbConnect(RSQLite::SQLite(), fname)
file.info(fname)$size
# [1] 0
dbWriteTable(con, "iris", iris)
# [1] TRUE
file.info(fname)$size
# [1] 16384
dbGetQuery(con, "select * from iris where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          4.7         3.2          1.6         0.2  setosa
file.info(fname)$size
# [1] 16384
dbExecute(con, "update iris set [Species]='virginica' where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
# [1] 1
dbGetQuery(con, "select * from iris where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
#   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
# 1          4.7         3.2          1.6         0.2 virginica
dbDisconnect(con)
file.info(fname)$size
# [1] 16384

Pros

It is cross-platform. It is more prolific than most people realize, being a required internal component of the Firefox browser and the Android OS. (Many others, too.)
Additionally, drivers exist in most programming languages, including R, python, ruby, and many too many to list here.
There is really no practical limits to the amount of data that can be stored in a single SQLite file. It theoretically supports up to 140TB (https://www.sqlite.org/whentouse.html), though if you're getting this large, there are many (justified) arguments towards different solutions.
Pulling data is built on the SQL standard, and though it is not 100% compliant, it is pretty darn close. Query time/performance is predicated on your query size, but generally pretty fast (ref: Will SQLite performance degrade if the database size is greater than 2 gigabytes?)
In fact, it can be faster than individual file operations.

Cons

There will be "overhead" in the file size. Notable is that iris takes under 7K of memory (see object.size(iris)) yet the filesize starts at 16K. With larger data, the gap ratio (file size to actual data) will shrink. (I did the same things with ggplot2::diamonds; the object is 3456376 bytes and the file size is 3780608, less than 10% larger.)
The file size will increase when SQLite deems necessary. This is based on many factors outside the scope of R and this question/answer.
If you delete a lot of data, the file size does not immediately reduce to accommodate that ... see change sqlite file size after "DELETE FROM table" (hint: vacuum)
There are many tools that will easily/immediately import data from this file format, but notably absent is Excel and Access. It is feasible with SQLite-ODBC, but takes a little elbow-grease to do it. (I'm fine with it, but not all users will be, and some enterprise networks make this step difficult or specifically disallowed.)

SQLite-file-as-CSV

You can treat it like a file when importing, if you want to import all of it:

con <- dbConnect(RSQLite::SQLite(), fname)
iris2 <- dbGetQuery(con, "select * from iris")
dbDisconnect(con)

as compared to

iris2 <- read.csv("iris.csv", stringsAsFactors = FALSE)

If you want to get fancy:

import_sqlite <- function(fname, tablename = NA) {
  if (length(tablename) > 1L) {
    warning("the condition has length > 1 and only the first element will be used")
    tablename <- tablename[[1L]]
  }
  con <- DBI::dbConnect(RSQLite::SQLite(), fname)
  on.exit(DBI::dbDisconnect(con), add = TRUE)
  available_tables <- DBI::dbListTables(con)
  if (length(available_tables) == 0L) {
    stop("no tables found")
  } else if (is.na(tablename)) {
    if (length(available_tables) == 1L) {
      tablename <- available_tables
    }
  }
  if (tablename %in% available_tables) {
    tablename <- DBI::dbQuoteIdentifier(con, tablename)
    qry <- sprintf("select * from %s", tablename)
    out <- tryCatch(list(data = DBI::dbGetQuery(con, DBI::SQL(qry)),
                         err = NULL),
                    error = function(e) list(data = NULL, err = e))
    if (! is.null(out$err)) {
      stop("[sqlite error] ", out$err$message)
    } else {
      return(out$data)
    }    
  } else {
    stop(sprintf("table %s not found", DBI::dbQuoteIdentifier(con, tablename)))
  }
}
head(import_sqlite("iris.sqlite3"))
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

(I'm not offering that function as anything other than a proof-of-concept that you can interact with the single file as if it were a CSV. There are some safeguards in there but is really just a hack for the sake of this question.)

Change and save only one row in a file

1 Answers

Pros

Cons

SQLite-file-as-CSV