2
votes

I have a data.frame in R which is in long format.

Gender Area Value Year Column X Column Y And other columns...
male urban 31.45 2012 xxx yyy zzz
male rural 41 2012 xxx yyy zzz
male rural 35.6 2012 xxx yyy zzz
female rural 30 2012 xxx yyy zzz
female urban 21 2016 xxx yyy zzz
female rural 19 2016 xxx yyy zzz

The main aim is to add a new column, say seg to this data.frame whose rows are filled in the following format:

Gender Area Value Year Column X Column Y And other columns... seg
male urban 31.45 2012 xxx yyy zzz {"Gender": "male", "Area": "urban", "Year": 2012}
male rural 41 2012 xxx yyy zzz {"Gender": "male", "Area": "rural", "Year": 2012}
male rural 35.6 2012 xxx yyy zzz {"Gender": "male", "Area": "rural", "Year": 2012}
female rural 30 2012 xxx yyy zzz {"Gender": "female", "Area": "rural", "Year": 2012}
female urban 21 2016 xxx yyy zzz {"Gender": "female", "Area": "urban", "Year": 2016}
female rural 19 2016 xxx yyy zzz {"Gender": "female", "Area": "rural", "Year": 2016}

The format of the contents of the seg column is important because it will be fed to a database.

Can anyone advise me how exactly to proceed with that in R? I have spent quite sometime looking around for solutions and couldn't find anything that takes column names, row values in a data.frame and converts them into json strings. The manipulation is not on all the columns but only for specific columns.

Appreciate the help in advance!

2

2 Answers

2
votes

We can use toJSON from jsonlite

library(purrr)
library(dplyr)
library(jsonlite)
out <- df1 %>% 
    mutate(seg = pmap_chr(select(cur_data(), Gender, Area, Year), ~ 
          toJSON(list(...), auto_unbox = TRUE))) 

-output

out
  Gender  Area Value Year ColumnX ColumnY                                            seg
1   male urban 31.45 2012     xxx     yyy   {"Gender":"male","Area":"urban","Year":2012}
2   male rural 41.00 2012     xxx     yyy   {"Gender":"male","Area":"rural","Year":2012}
3   male rural 35.60 2012     xxx     yyy   {"Gender":"male","Area":"rural","Year":2012}
4 female rural 30.00 2012     xxx     yyy {"Gender":"female","Area":"rural","Year":2012}
5 female urban 21.00 2016     xxx     yyy {"Gender":"female","Area":"urban","Year":2016}
6 female rural 19.00 2016     xxx     yyy {"Gender":"female","Area":"rural","Year":2016}

Or may use rowwise

df1 %>% 
   rowwise %>% 
   mutate(seg = toJSON(lst(Gender, Area, Year), auto_unbox = TRUE)) %>% 
   ungroup
# A tibble: 6 x 7
  Gender Area  Value  Year ColumnX ColumnY seg                                           
  <chr>  <chr> <dbl> <int> <chr>   <chr>   <json>                                        
1 male   urban  31.4  2012 xxx     yyy     {"Gender":"male","Area":"urban","Year":2012}  
2 male   rural  41    2012 xxx     yyy     {"Gender":"male","Area":"rural","Year":2012}  
3 male   rural  35.6  2012 xxx     yyy     {"Gender":"male","Area":"rural","Year":2012}  
4 female rural  30    2012 xxx     yyy     {"Gender":"female","Area":"rural","Year":2012}
5 female urban  21    2016 xxx     yyy     {"Gender":"female","Area":"urban","Year":2016}
6 female rural  19    2016 xxx     yyy     {"Gender":"female","Area":"rural","Year":2016}

data

df1 <- structure(list(Gender = c("male", "male", "male", "female", "female", 
"female"), Area = c("urban", "rural", "rural", "rural", "urban", 
"rural"), Value = c(31.45, 41, 35.6, 30, 21, 19), Year = c(2012L, 
2012L, 2012L, 2012L, 2016L, 2016L), ColumnX = c("xxx", "xxx", 
"xxx", "xxx", "xxx", "xxx"), ColumnY = c("yyy", "yyy", "yyy", 
"yyy", "yyy", "yyy")), class = "data.frame", row.names = c(NA, 
-6L))
1
votes

Here is my approach with purrr:

library(purrr)

pmap_dfr(your_data, \(...) {
  tibble::tibble(..., seg = imap(list(...), ~ sprintf("\"%s\": \"%s\"",
                                                      .y, .x)) %>%
                              paste(collapse = ", ") %>%
                              sprintf("{%s}", .))
})

Returns:

# A tibble: 6 x 8
  Gender Area  Value  Year Column.X Column.Y And.other.column… seg              
  <chr>  <chr> <dbl> <int> <chr>    <chr>    <chr>             <chr>            
1 male   urban  31.4  2012 xxx      yyy      zzz               "{\"Gender\": \"…
2 male   rural  41    2012 xxx      yyy      zzz               "{\"Gender\": \"…
3 male   rural  35.6  2012 xxx      yyy      zzz               "{\"Gender\": \"…
4 female rural  30    2012 xxx      yyy      zzz               "{\"Gender\": \"…
5 female urban  21    2016 xxx      yyy      zzz               "{\"Gender\": \"…
6 female rural  19    2016 xxx      yyy      zzz               "{\"Gender\": \"…

(Data used:)

your_data <- structure(list(Gender = c("male", "male", "male", "female", "female", "female"), Area = c("urban", "rural", "rural", "rural", "urban", "rural"), Value = c(31.45, 41, 35.6, 30, 21, 19), Year = c(2012L, 2012L, 2012L, 2012L, 2016L, 2016L), Column.X = c("xxx", "xxx", "xxx", "xxx", "xxx", "xxx"), Column.Y = c("yyy", "yyy", "yyy", "yyy", "yyy", "yyy"), And.other.columns... = c("zzz", "zzz", "zzz", "zzz", "zzz", "zzz")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))