0
votes

I would like to create a JSON string of dataframe columns for each row of a dataframe (ie. to get a vector of Json strings). i.e. I would like to replicate this output from this code, but with more efficient code (as speed is atrocious on my non-toy dataframe):

apply(mtcars, 1, function(x) jsonlite::toJSON(as.list(x), na = "null", auto_unbox = TRUE))

Running the following is fast, however I'm not sure how to manipulate to get the same format as from the above code.

jsonlite::toJSON(mtcars, dataframe = "rows", pretty=FALSE, na = "null", auto_unbox = TRUE, na = "null")

Sample desired output:

Mazda RX4 "{"mpg":21,"cyl":6,"disp":160,"hp":110,"drat":3.9,"wt":2.62,"qsec":16.46,"vs":0,"am":1,"gear":4,"carb":4}" > Mazda RX4 Wag
"{"mpg":21,"cyl":6,"disp":160,"hp":110,"drat":3.9,"wt":2.875,"qsec":17.02,"vs":0,"am":1,"gear":4,"carb":4}"

Datsun 710
"{"mpg":22.8,"cyl":4,"disp":108,"hp":93,"drat":3.85,"wt":2.32,"qsec":18.61,"vs":1,"am":1,"gear":4,"carb":1}"

1

1 Answers

1
votes

Try this

df_to_json_vec <- function(df) {
  # `collapse = FALSE` is the key.
  jsonlite::toJSON(df, dataframe = "rows", na = "null", auto_unbox = TRUE, collapse = FALSE)
}
`names<-`(df_to_json_vec(`row.names<-`(df, NULL)), row.names(df))

Performance benchmark

# create a dataframe ten times the length of mtcars
df <- dplyr::bind_rows(replicate(10, mtcars, simplify = FALSE))
microbenchmark::microbenchmark(
  `names<-`(df_to_json_vec(`row.names<-`(df, NULL)), row.names(df)), 
  apply(df, 1, function(x) jsonlite::toJSON(as.list(x), na = "null", auto_unbox = TRUE)), 
  unit = "ms"
)

Results

Unit: milliseconds
                                                                                                expr      min        lq     mean   median        uq      max neval cld
                           res1 <- `names<-`(df_to_json_vec(`row.names<-`(df, NULL)), row.names(df))   1.3687   1.47295   1.6430   1.5700   1.75415   2.7115   100  a 
 res2 <- apply(df, 1, function(x) jsonlite::toJSON(as.list(x),      na = "null", auto_unbox = TRUE)) 206.9659 223.43650 234.2259 230.8466 241.79225 343.3378   100   b

Also, the new code produces the same set of results as the old one does.

> all(res1 == res2)
[1] TRUE