2
votes

I have json strings inside a dataframe column. I want to bring all these new json columns into the dataframe.

# Input
JsonID <- as.factor(c(1,2,3))
JsonString1 = "{\"device\":{\"site\":\"Location1\"},\"tags\":{\"Engine Pressure\":\"150\",\"timestamp\":\"2608411982\",\"historic\":false,\"adhoc\":false},\"online\":true,\"time\":\"2608411982\"}"
JsonString2 = "{\"device\":{\"site\":\"Location2\"},\"tags\":{\"Engine Pressure\":\"160\",\"timestamp\":\"3608411983\",\"historic\":false,\"adhoc\":false},\"online\":true,\"time\":\"3608411983\"}"
JsonString3 = "{\"device\":{\"site\":\"Location3\"},\"tags\":{\"Brake Fluid\":\"100\",\"timestamp\":\"4608411984\",\"historic\":false,\"adhoc\":false},\"online\":true,\"time\":\"4608411984\"}"
JsonStrings = c(JsonString1, JsonString2, JsonString3)
Example <- data.frame(JsonID, JsonStrings)

Using the jsonlite library I can make each json string into a 1 row dataframe.

library(jsonlite)

# One row dataframes
DF1 <- data.frame(fromJSON(JsonString1))
DF2 <- data.frame(fromJSON(JsonString2))
DF3 <- data.frame(fromJSON(JsonString3))

Unfortunately the JsonID variable column is lost. All json strings share common column name such as "time". But there are column names they don't share. By pivoting the data longer I could Rbind all the dataframes together.

library(dplyr)
library(tidyr)

# Row bindable one row dataframes
DF1_RowBindable <- DF1 %>%
  rename_all(~gsub("tags.", "", .x)) %>% 
  tidyr::pivot_longer(cols = c(colnames(.)[2]))

Is there a better way to do this?

I have never worked with json strings before. The solution must be computationally scalable.

1

1 Answers

2
votes

We can store the data from fromJSON in list in the dataframe itself so we don't loose any information that we already have in the data. We can use unnest_wider to create new columns from named list.

library(dplyr)
library(tidyr)
library(jsonlite)

Example %>%
  rowwise() %>%
  mutate(data = list(fromJSON(JsonStrings))) %>%
  unnest_wider(data) %>%
  select(-JsonStrings) %>%
  unnest_wider(tags) %>%
  unnest_wider(device)

# JsonID site      `Engine Pressure` timestamp  historic adhoc `Brake Fluid` online time      
#  <fct>  <chr>     <chr>             <chr>      <lgl>    <lgl> <chr>         <lgl>  <chr>     
#1 1      Location1 150               2608411982 FALSE    FALSE NA            TRUE   2608411982
#2 2      Location2 160               3608411983 FALSE    FALSE NA            TRUE   3608411983
#3 3      Location3 NA                4608411984 FALSE    FALSE 100           TRUE   4608411984

Since each column (data, tags, device) are of different lengths we need to use unnest_wider separately on each one of them.