Representative sample data (list of lists):
l <- list(structure(list(a = -1.54676469632688, b = "s", c = "T",
d = structure(list(id = 5L, label = "Utah", link = "Asia/Anadyr",
score = -0.21104594634643), .Names = c("id", "label",
"link", "score")), e = 49.1279871269422), .Names = c("a",
"b", "c", "d", "e")), structure(list(a = -0.934821052832427,
b = "k", c = "T", d = list(structure(list(id = 8L, label = "South Carolina",
link = "Pacific/Wallis", score = 0.526540892113734, externalId = -6.74354377676955), .Names = c("id",
"label", "link", "score", "externalId")), structure(list(
id = 9L, label = "Nebraska", link = "America/Scoresbysund",
score = 0.250895465294041, externalId = 16.4257470807879), .Names = c("id",
"label", "link", "score", "externalId"))), e = 52.3161400117052), .Names = c("a",
"b", "c", "d", "e")), structure(list(a = -0.27261485993069, b = "f",
c = "P", d = list(structure(list(id = 8L, label = "Georgia",
link = "America/Nome", score = 0.526494135483816, externalId = 7.91583574935589), .Names = c("id",
"label", "link", "score", "externalId")), structure(list(
id = 2L, label = "Washington", link = "America/Shiprock",
score = -0.555186440792989, externalId = 15.0686663219837), .Names = c("id",
"label", "link", "score", "externalId")), structure(list(
id = 6L, label = "North Dakota", link = "Universal",
score = 1.03168296038975), .Names = c("id", "label",
"link", "score")), structure(list(id = 1L, label = "New Hampshire",
link = "America/Cordoba", score = 1.21582056168681, externalId = 9.7276418869132), .Names = c("id",
"label", "link", "score", "externalId")), structure(list(
id = 1L, label = "Alaska", link = "Asia/Istanbul", score = -0.23183264861979), .Names = c("id",
"label", "link", "score")), structure(list(id = 4L, label = "Pennsylvania",
link = "Africa/Dar_es_Salaam", score = 0.590245339334121), .Names = c("id",
"label", "link", "score"))), e = 132.1153538536), .Names = c("a",
"b", "c", "d", "e")), structure(list(a = 0.202685974077313, b = "x",
c = "O", d = structure(list(id = 3L, label = "Delaware",
link = "Asia/Samarkand", score = 0.695577130634724, externalId = 15.2364820698193), .Names = c("id",
"label", "link", "score", "externalId")), e = 97.9908914452971), .Names = c("a",
"b", "c", "d", "e")), structure(list(a = -0.396243444741009,
b = "z", c = "P", d = list(structure(list(id = 4L, label = "North Dakota",
link = "America/Tortola", score = 1.03060272795705, externalId = -7.21666936522344), .Names = c("id",
"label", "link", "score", "externalId")), structure(list(
id = 9L, label = "Nebraska", link = "America/Ojinaga",
score = -1.11397997280413, externalId = -8.45145052697411), .Names = c("id",
"label", "link", "score", "externalId"))), e = 123.597945533926), .Names = c("a",
"b", "c", "d", "e")))
I have a list of lists, by virtue of a JSON data download.
The list has 176 elements, each with 33 nested elements some of which are also lists of varying length.
I am interested in analyzing the data contained in a particular nested list, which has a length of ~150 for each of the 176 which has either 4 or 5 elements -- some have 4 and some have 5. I am trying to extract this nested list of interest and convert it into a data.frame
to be able to perform some analysis.
In the representative sample data above, I am interested in the nested list d
for each of the 5 elements of l
. The desired data.frame
would therefore look something like:
id label link score externalId
5 Utah Asia/Anadyr -0.2110459 NA
8 South Carolina Pacific/Wallis 0.5265409 -6.743544
.
.
I've been attempting to use purrr
which appears to have a sensible and consistent flow for processing data in lists, but I am running into errors that I can't fully understand the cause of -- could very well be that I don't properly understand the commands/logic of purrr
or lists (likely both). This is the code I've been attempting but throws the associated error:
df <- map_df(l, "d", ~as.data.frame(.))
Error: incompatible sizes (5 != 4)
I believe this has to do with the differing lengths of d
for each component, or perhaps the differing contained data (sometimes 4 elements sometimes 5) or perhaps the function I've used here is misspecified -- truthfully I'm not entirely sure.
I have worked around this by using a for loop, which I know is inefficient and hence my question here on SO.
This is the for loop I currently employ:
df <- data.frame(id = integer(), label = character(), score = numeric(), externalId = numeric())
for(i in seq_along(l)){
df_temp <- l[[i]][[4]] %>% map_df(~as.data.frame(.))
df <- rbind(df, df_temp)
}
Some assistance preferably with purrr
- alternatively some version of apply
as this is still superior to my for-loop - would be greatly appreciated. Also if there's a resource for the above I'd like to understand rather than just find the right code.