1
votes

I have a CSV with a JSON string with inconsistent ordering of the fields. So it looks like this:

Row 1: '{"name":"John", "age":30, "car":null}'
Row 2: '{"name":"Chuck", "car":black, "age":25}'
Row 3: '{"car":blue, "age":54, "name":"David"}'

I’m hoping to use R to parse this out into columns with the appropriate data. So I’d like to create a ‘name’ column, ‘age’ column, and ‘car’ column and have them populate with the appropriate data. Is there anyway to do this using JSONlite, or would I need to figure out a way to essentially query the JSON string for the property name (car, name, age) and populate that column with the subsequent value?

2

2 Answers

0
votes

you can use jsonlite library, but however in order to parse the data you must make some "adjustments" to your string. Lets say that you have the df as follows

my_df <- data.frame(column_1 = 
  c('{"name":"John", "age":30, "car":null}',
  '{"name":"Chuck", "car":"black", "age":25}',
  '{"car":"blue", "age":54, "name":
  "David"}')
)

You must have a valid json format in order to parse the data properly. In this case is an array json format, so the data must have [ and ]. Also each element must be separated by ,. Be careful with the strings, each one must have "<string>". (You didn't add it in your example with blue and black data)

With that in mind we can now make some code:

# Base R

# Add "commas" to sep each element
new_json_str <- paste(my_df$column_1, collapse = ",")

# Add "brackets" to the string
new_json_str <- paste("[", new_json_str, "]")

# Parse the JSON string with jsonlite
jsonlite::fromJSON(new_json_str)


# With dplyr library

my_df %>% 
    pull(column_1) %>%              # Get column as "vector"
    paste(collapse = ",") %>%       # Add "commas"
    paste("[", . ,"]") %>%          # Add "bracket" (`.` represents the current value, in this case vectors sep by ",")
    jsonlite::fromJSON()            # Parse json to df

# Result

#    name age   car
# 1  John  30  <NA>
# 2 Chuck  25 black
# 3 David  54  blue
0
votes

Alternatively, the RcppSimdJson package can be used. Depending on the format of the data file we can

  • either convert the data row by row using fparse()
  • or read and convert the file in one go using fload()

Converting the data row by row

If the data file json_data_1.csv has the format

{"name":"John", "age":30, "car":null}
{"name":"Chuck", "car":"black", "age":25}
{"car":"blue", "age":54, "name":"David"}

(note that blue and black have been enclosed in double quotes to obey JSON syntax rules)

the JSON data need to be converted row by row, e.g.

library(magrittr)   # piping used to improve readability
readLines("json_data_1.csv") %>% 
  lapply(RcppSimdJson::fparse) %>% 
  data.table::rbindlist(fill = TRUE)
    name age   car
1:  John  30  <NA>
2: Chuck  25 black
3: David  54  blue

Reading and converting the file in one go

If the data file json_data_2.csv has the format

[
{"name":"John", "age":30, "car":null},
{"name":"Chuck", "car":"black", "age":25},
{"car":"blue", "age":54, "name":"David"}
]

(note the square brackets and the commas which indicate an array in JSON syntax)

the file can be read and converted by one line of code:

RcppSimdJson::fload("json_data_2.csv")
   name age   car
1  John  30  <NA>
2 Chuck  25 black
3 David  54  blue