2
votes

I was trying to subset a large list with 278226 elements and each element (shown as below) is also a list which has a number(between 39 and 50) of sub-elements(size 1 atomic vector with different names).

> str(listings_England[9922])
List of 1
 $ listing:List of 40
  ..$ agent_address       : chr "35 John Street, Luton"
  ..$ agent_logo          : chr "https://st.zoocdn.com/zoopla_static_agent_logo_(257607).png"
  ..$ agent_name          : chr "Ashton Carter Homes"
  ..$ agent_phone         : chr "020 8115 4543"
  ..$ category            : chr "Residential"
  ..$ country             : NULL
  ..$ country_code        : chr "gb"
  ..$ county              : NULL
  ..$ displayable_address : chr "Hatters Way Luton, Luton LU1"
  ..$ first_published_date: chr "2017-11-16 17:25:36"
  ..$ last_published_date : chr "2018-01-29 18:40:52"
  ..$ latitude            : chr "51.88188"
  ..$ listing_id          : chr "39336869"
  ..$ listing_status      : chr "sale"
  ..$ longitude           : chr "-0.43237194"

Then I extract sub-elements such as "listing_id" as below:

> id1 <- sapply(listings_England, "[[", "listing_id")
Error in FUN(X[[i]], ...) : subscript out of bounds
> id3 <- sapply(listings_England[1:100000], "[[", "listing_id")
Error in FUN(X[[i]], ...) : subscript out of bounds
> id2 <- sapply(listings_England[1:50000], "[[", "listing_id")
> 

> listings_England$listing_id
NULL
> 

As you can see, it only works for the last one (same problem for the purrr::map family functions). I was wondering if it the limitation of these functions. And my current solution is:

id <- sapply(listings_England, function(x) x["listing_id"]) %>% as.numeric()

The problem here is "[[" or "$" function is not working for this large list, and only "[" works.

4
If it works for elements 1:50000 but not 1:100000, I bet there's an element between in the 50000:100000 range that doesn't have a listing_id property, or the whole thing is NULL.Jesse Tweedle
@JesseTweedle Yes, you are right! It's NULL causing this problem. Thanks!Yunlong Huang
Lists are annoying, and big ones are worse bc they toss out weird errors that don't point you in the right direction. If I can, I usually try to convert them to tibbles as soon as I can (either with enframe or bind_rows either directly or in some combination with map).Jesse Tweedle
Oh, two more suggestions: just bind_rows(listings_England) right away, or maybe purrr:discard(listings_England, is.null) to drop NULL elements right away.Jesse Tweedle

4 Answers

1
votes

As @JesseTweedle comments, your issue is a data-specific one. Somewhere in your data object listing_id does not exist as a named element and hence errs out. Consider wrapping your sapply function in a tryCatch to return NAs for those elements without listing_id with either [[ or $:

id2 <- sapply(listings_England[1:100000], function(x) 
                 tryCatch(x[["listing_id"]],
                          warning = function(w) return(NA),
                          error = function(e) return(NA)
                 )
       ) 

Additionally, per your post it looks like you have a nested structure with a named listing. Try this:

id2 <- sapply(listings_England[1:100000], function(x) 
                 tryCatch(x$listing$listing_id,
                          warning = function(w) return(NA),
                          error = function(e) return(NA)
                 )
       ) 
0
votes

If you want to convert the listing_id entry to numeric, just use as.numeric directly:

listings_England$listing_id <- as.numeric(listings_England$listing_id)

sapply is what you would use if you wanted to apply a function to each element across a vector. But since as.numeric is already vectorized, you don't need an apply function in this case.

0
votes

You have what I would call a "nested list". You can see from the str output that there is only one item at the top of your "element tree". Try this:

id1 <- sapply(listings_England[[1]], "[[", "listing_id")

It then extracts the first item (which has all of the content) and works on the resulting list. Could also use the equivalent operation:

id1 <- sapply(listings_England$listing, "[[", "listing_id")
0
votes

This is the “Missing/out of bounds indices" problem, [ and [[ differ slightly in their behaviour when the index is out of bounds (OOB). Details can be found in the "Advanced R" book section 4.3.3 (the following link) [https://adv-r.hadley.nz/subsetting.html#subsetting-operators]