2
votes

I am trying to parse this data type in R in order to get a visualisation in Tableau (tried with Tableau but it is quite complicated and doesn't give a good result).

Below is an example of the data type I am trying to parse:

{"010010208022":215,"010010205002":195,"010010208012":184,"010010208021":165,"010010206001":132,"010010205003":110,"010010211001":100,"010010208024":93,"010010201002":91,"010010205001":84,"010010210002":82,"010510310002":75,"010010203001":74}

Basically this is just one entry. I have three columns of this type. I am trying to parse each entry and sum up the numbers (after each colon) to get a new column for each of the three variables which holds the sums of the numbers for each entry.

This is for using it in Tableau to get a visualization using these totals.

The data I have is an excel sheet which includes three variables stored as Text in Excel which have the above data type.

The imported dataset in R looks as follows:

> dput(head(dataset))
structure(list(census_block_group = c(10010201001, 10010201002, 
10010202001, 10010202002, 10010203001, 10010203002), date_range_start = c(1538352000, 
1538352000, 1538352000, 1538352000, 1538352000, 1538352000), 
    date_range_end = c(1541030400, 1541030400, 1541030400, 1541030400, 
    1541030400, 1541030400), raw_visit_count = c(54544, 19583, 
    24552, 61089, 41139, 91053), raw_visitor_count = c(14460, 
    2351, 3020, 7123, 5132, 20907), visitor_home_cbgs = c("{\"010010208022\":215,\"010010205002\":195,\"010010208012\":184,\"010010208021\":165,\"010010206001\":132,\"010010205003\":110,\"010010211001\":100,\"010010208024\":93,\"010010201002\":91,\"010010205001\":84,\"010010210002\":82,\"010510310002\":75,\"010010203001\":74,\"010010208011\":73,\"010010204002\":62,\"010510309011\":61,\"010510309012\":61,\"010010207001\":58,\"010010209004\":58,\"010010207002\":55,\"010010209002\":54,\"010010208023\":53,\"011250112001\":53}", 
    "{\"010010208022\":143,\"010010208021\":140,\"010010205002\":116,\"010010206001\":87,\"010010208012\":83,\"010010205003\":56}", 
    "{\"010010208022\":193,\"010010205002\":170,\"010010208012\":112,\"010010206001\":96,\"010010203001\":84,\"010010205003\":82,\"010010208021\":74,\"010010208024\":71,\"010010205001\":67,\"010010201002\":65,\"010010204002\":55,\"010010208011\":55}", 
    "{\"010010205002\":326,\"010010208022\":287,\"010010208012\":184,\"010010206001\":170,\"010010208021\":162,\"010010208024\":145,\"010010205003\":141,\"010010205001\":125,\"010010203001\":121,\"010510310002\":113,\"010010204002\":99,\"010010211001\":96,\"010010207001\":95,\"010010201002\":94,\"010510309011\":94,\"010010208011\":88,\"010010209004\":79,\"010510309012\":78,\"010510310001\":72,\"010010207002\":70,\"010010210002\":69,\"010010208023\":67,\"010510301001\":58,\"010510313001\":57,\"010010202001\":56,\"010010209002\":56,\"010510311002\":52}", 
    "{\"010010208022\":247,\"010010205002\":235,\"010010208024\":135,\"010010208012\":121,\"010010205003\":116,\"010010206001\":111,\"010010208021\":99,\"010010209004\":91,\"010010204002\":86,\"010010205001\":83,\"010010201002\":62,\"010010207002\":61,\"010510310002\":60,\"010010209002\":58,\"010010207001\":56,\"010010208023\":56,\"010010208011\":55,\"010510309011\":54,\"010010204001\":52}", 
    "{\"010010205002\":391,\"010010208022\":328,\"010010208012\":181,\"010010208024\":174,\"010010206001\":169,\"010010205003\":161,\"010510310002\":160,\"010010208021\":158,\"010010203001\":152,\"010010205001\":138,\"010010204002\":130,\"010510309011\":129,\"010510309012\":128,\"010010207001\":113,\"010510310001\":105,\"010010209004\":99,\"010010201002\":96,\"010010211001\":93,\"010510313001\":89,\"010010207002\":85,\"010010208011\":85,\"010510301001\":81,\"010510311002\":78,\"010010209002\":76,\"010010210002\":71,\"010010208023\":70,\"010010202002\":63,\"010510307012\":60,\"010010204001\":59,\"010010209001\":56,\"010010202001\":54,\"010010204003\":54,\"010510301004\":53,\"010010206002\":53,\"010510309023\":51,\"010510301003\":50}"
    ), visitor_work_cbgs = c("{\"010010208022\":233,\"011250112001\":106,\"011010001001\":70,\"011010002001\":70,\"010510313001\":63,\"010010209004\":59,\"010010205002\":54}", 
    "{\"010010208022\":223}", "{\"010010208022\":258}", "{\"010010208022\":262,\"011010002001\":90,\"010510313001\":87,\"011010001001\":86,\"010010209004\":72,\"010010205002\":68,\"010510311001\":57,\"010010202001\":50}", 
    "{\"010010208022\":265,\"010010209004\":66,\"010510313001\":59,\"011010001001\":57,\"011010002001\":55,\"010010202001\":54}", 
    "{\"010010208022\":278,\"011010002001\":110,\"011010001001\":106,\"010510313001\":104,\"010510311001\":82,\"010010209004\":80,\"010010205002\":78,\"011010009002\":61,\"011010054072\":59,\"010010202001\":54,\"011010053021\":53,\"010010207002\":50}"
    ), distance_from_home = c(45094, 7346, 6977, 13528, 11597, 
    16162), related_same_day_brand = c("[\"Winn Dixie\",\"Chick-fil-A\",\"walmart\"]", 
    "[\"Winn Dixie\",\"walmart\",\"Chick-fil-A\"]", "[\"America's Thrift Store\",\"Winn Dixie\",\"Chick-fil-A\"]", 
    "[\"Winn Dixie\",\"walmart\",\"America's Thrift Store\",\"Chick-fil-A\",\"Dollar General\"]", 
    "[\"America's Thrift Store\",\"walmart\",\"Chick-fil-A\"]", 
    "[\"Winn Dixie\",\"walmart\",\"America's Thrift Store\"]"
    ), related_same_month_brand = c("[\"walmart\",\"Dollar General\",\"Chick-fil-A\",\"America's Thrift Store\",\"mcdonalds\",\"Zaxby's\",\"Jack's Family Restaurants\",\"Winn Dixie\",\"Waffle House\",\"Chevron\"]", 
    "[\"walmart\",\"America's Thrift Store\",\"Chick-fil-A\",\"Dollar General\",\"mcdonalds\",\"Winn Dixie\",\"Zaxby's\",\"Wendy's\",\"Sonic\",\"Taco Bell\"]", 
    "[\"walmart\",\"America's Thrift Store\",\"Dollar General\",\"Chick-fil-A\",\"mcdonalds\",\"Winn Dixie\",\"Zaxby's\",\"Sonic\",\"Wendy's\",\"Taco Bell\"]", 
    "[\"walmart\",\"Dollar General\",\"Chick-fil-A\",\"mcdonalds\",\"America's Thrift Store\",\"Winn Dixie\",\"Zaxby's\",\"Sonic\",\"Wendy's\",\"Publix Super Markets\"]", 
    "[\"walmart\",\"Dollar General\",\"mcdonalds\",\"America's Thrift Store\",\"Chick-fil-A\",\"Winn Dixie\",\"Zaxby's\",\"Wendy's\",\"Taco Bell\",\"Sonic\"]", 
    "[\"America's Thrift Store\",\"Dollar General\",\"walmart\",\"Winn Dixie\",\"Chick-fil-A\",\"Zaxby's\",\"Waffle House\",\"Jack's Family Restaurants\",\"Publix Super Markets\",\"Sonic\"]"
    ), top_brands = c("[]", "[]", "[]", "[\"Alabama DMV\",\"Fox's Pizza Den\",\"America's Thrift Store\",\"CrossFit\",\"Habitat for Humanity\"]", 
    "[]", "[\"Winn Dixie\",\"Planet Fitness\",\"America's Thrift Store\",\"Dollar General\",\"fred's\",\"Church's Chicken\",\"Advance Auto Parts\",\"Sunoco\",\"Boost Mobile\",\"Factory Connection\"]"
    ), popularity_by_hour = c("[1840,1666,1562,1541,1735,1977,3045,6630,3445,3681,3529,3894,4224,4292,4531,6463,5084,5378,4814,4162,3394,2918,2615,2074]", 
    "[2201,2137,2121,2108,2118,2175,2465,4458,1723,1664,1483,1471,1585,1586,1740,3185,2166,2566,2441,2339,2359,2319,2300,2255]", 
    "[1280,1233,1220,1199,1206,1288,1887,6141,2267,2196,2028,2059,2239,2337,2793,4935,2234,2685,2327,2052,1829,1616,1433,1340]", 
    "[1518,1326,1272,1199,1285,1539,2641,7010,4374,4681,4997,5268,5662,5346,5733,7468,6253,6882,5971,4754,3619,2757,2292,1799]", 
    "[3318,3261,3232,3212,3213,3309,4079,7363,3271,3115,3113,3187,3305,3445,3601,7069,4687,5433,5104,4775,4394,3905,3744,3429]", 
    "[2109,1738,1979,2393,2579,2953,4381,8526,5665,5653,5859,6539,7119,7046,6950,9672,8220,9003,7618,6015,4763,3713,3230,2629]"
    ), popularity_by_day = c("{\"Monday\":7542,\"Tuesday\":8610,\"Wednesday\":9274,\"Thursday\":7473,\"Friday\":8060,\"Saturday\":7222,\"Sunday\":6363}", 
    "{\"Monday\":3160,\"Tuesday\":3167,\"Wednesday\":3353,\"Thursday\":2721,\"Friday\":2807,\"Saturday\":2315,\"Sunday\":2060}", 
    "{\"Monday\":3688,\"Tuesday\":4192,\"Wednesday\":4221,\"Thursday\":3876,\"Friday\":3887,\"Saturday\":2846,\"Sunday\":1842}", 
    "{\"Monday\":9091,\"Tuesday\":9653,\"Wednesday\":10016,\"Thursday\":8397,\"Friday\":9017,\"Saturday\":7919,\"Sunday\":6996}", 
    "{\"Monday\":6126,\"Tuesday\":6681,\"Wednesday\":6811,\"Thursday\":5935,\"Friday\":6251,\"Saturday\":5275,\"Sunday\":4060}", 
    "{\"Monday\":13680,\"Tuesday\":13569,\"Wednesday\":14006,\"Thursday\":11815,\"Friday\":13480,\"Saturday\":13697,\"Sunday\":10806}"
    )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

Upon importing the excel into R, the variables were stored as character.

I then ran the toJSON command to turn them into JSON type. Upon running typeof command after the toJSON command the result is character again.

The error I get now is related to the sum command run on the list (res). The error I get is:

Error in sum(unlist(res_home_visitor)) : 
  invalid 'type' (character) of argument

and the result of unlist command is:

visitor_home_cbgs1                                                                                                                                                                                                                                                                                "{\"010010208022\":215,\"010010205002\":195,\"010010208012\":184,\"010010208021\":165,\"010010206001\":132,\"010010205003\":110,\"010010211001\":100,\"010010208024\":93"   
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                visitor_home_cbgs2    
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "{\"010010208022\":143,\"010010208021\":140,\"010010205002\":116,\"010010206001\":87,\"010010208012\":83,\"010010205003\":56}"

Now I am new to coding and I have looked up the above data type but I'm not sure if it's called an associative array or JSON (thus the tags in the question).

What I researched and tried is using the split function in Tableau. However, it creates problems as the entries are not of the same size and a lot of the entries are large so the split function will create a lot of columns and this causes a lag and a lot of trouble in Tableau (due to limitation on the number of columns and so..)

** Here's an example of what I am trying to get:

{"010479561003":52,"010479561002":52, "110478541030":33}

For the above entry the total should be: 52+52+33= 137

This total (137) would be put into a new column called "totals" which is needed for the visualization.

1
"Upon importing the excel into R" - how are you doing this, and what is the object in your R session, is it a data.frame with list columns? If we have an example of this object it will be a lot easier to help.SymbolixAU
I basically used the "Import Dataset" in R which essentially runs the following command: library(readxl) dataset <- read_excel("D:/path/dataset.xlsx", sheet = "cbg"). Upon running typeof(dataset) command the result is listR.A
ok good, next what would be helpful is if you added the output of dput(head(dataset)) to your question. This will give code I can use to copy & paste into my own R session. Then I can work with the exact same data as you.SymbolixAU
Updated the question.R.A
the dput you have include gives an error when I copy & paste it; have you omitted any brackets?SymbolixAU

1 Answers

2
votes

You have a data.frame where some columns are JSON. It's my understanding you want to treat each row independently. Therefore, you can sum the result of fromJSON on a row-by-row basis.

In this example I'm using library(dplyr) to

  1. rowwise() - apply the code to each row
  2. mutate() - create a new column, which is the result of sum( unlist( fromJSON() ) )

library(dplyr)
library(jsonlite)

res <- dataset %>%
  rowwise() %>%
  mutate(visitor_home_cbgs_sum = sum( unlist (jsonlite::fromJSON( visitor_home_cbgs ) ) ) ) %>%
  mutate(visitor_work_cbgs_sum = sum( unlist (jsonlite::fromJSON( visitor_work_cbgs ) ) ) ) 

Now we can look at the result by selecting only the original JSON columns, and our newly crated sum columns

res %>%
  select(visitor_home_cbgs, visitor_home_cbgs_sum, visitor_work_cbgs, visitor_work_cbgs_sum)

# Source: local data frame [6 x 4]
# Groups: <by row>
#   
#   # A tibble: 6 x 4
#   visitor_home_cbgs                     visitor_home_cbg… visitor_work_cbgs                    visitor_work_cbg…
# <chr>                                             <int> <chr>                                            <int>
# 1 "{\"010010208022\":215,\"01001020500…              2188 "{\"010010208022\":233,\"0112501120…               655
# 2 "{\"010010208022\":143,\"01001020802…               625 "{\"010010208022\":223}"                           223
# 3 "{\"010010208022\":193,\"01001020500…              1124 "{\"010010208022\":258}"                           258
# 4 "{\"010010205002\":326,\"01001020802…              3054 "{\"010010208022\":262,\"0110100020…               772
# 5 "{\"010010208022\":247,\"01001020500…              1838 "{\"010010208022\":265,\"0100102090…               556
# 6 "{\"010010205002\":391,\"01001020802…              4093 "{\"010010208022\":278,\"0110100020…              1115

In this example you only have two columns you want to update, so I'm leaving the two mutate() statements in there to be explicit about how this works.

If you have many / an unknown number of columns with JSON values you may want to consider another approach.