6
votes

One of the best ways to make a question reproducible is to use one of the built in data sets. Using data(), however, is frustrating because no information about the structure of the data set is provided.

How can I quickly view the structure of available data sets?

1

1 Answers

7
votes

The following function may help:

dataStr <- function(fun=function(x) TRUE)
  str(
    Filter(
      fun,
      Filter(
        Negate(is.null),
        mget(data()$results[, "Item"], inh=T, ifn=list(NULL))
  ) ) )

It accepts a filtering function, applies it to all the data sets, and prints out the structure of the matching data sets. For example, if we're looking for matrices:

> dataStr(is.matrix)
List of 8
 $ WorldPhones          : num [1:7, 1:7] 45939 60423 64721 68484 71799 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:7] "1951" "1956" "1957" "1958" ...
  .. ..$ : chr [1:7] "N.Amer" "Europe" "Asia" "S.Amer" ...
 $ occupationalStatus   : 'table' int [1:8, 1:8] 50 16 12 11 2 12 0 0 19 40 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ origin     : chr [1:8] "1" "2" "3" "4" ...
  .. ..$ destination: chr [1:8] "1" "2" "3" "4" ...
 $ volcano              : num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...
--- 5 entries omitted ---

Or for data frames (also omitting entries):

> dataStr(is.data.frame)
List of 42
 $ BOD             :'data.frame': 6 obs. of  2 variables:
  ..$ Time  : num [1:6] 1 2 3 4 5 7
  ..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
  ..- attr(*, "reference")= chr "A1.4, p. 270"
 $ CO2             :Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame':  84 obs. of  5 variables:
  ..$ Plant    : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
  ..$ Type     : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ conc     : num [1:84] 95 175 250 350 500 675 1000 95 175 250 ...
  ..$ uptake   : num [1:84] 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
--- 40 entries omitted ---

Or even for simple vectors:

> dataStr(function(x) is.atomic(x) && is.vector(x) && !is.ts(x))
List of 4
 $ euro   : Named num [1:11] 13.76 40.34 1.96 166.39 5.95 ...
  ..- attr(*, "names")= chr [1:11] "ATS" "BEF" "DEM" "ESP" ...
 $ islands: Named num [1:48] 11506 5500 16988 2968 16 ...
  ..- attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia" ...
 $ precip : Named num [1:70] 67 54.7 7 48.5 14 17.2 20.7 13 43.4 40.2 ...
  ..- attr(*, "names")= chr [1:70] "Mobile" "Juneau" "Phoenix" "Little Rock" ...
 $ rivers : num [1:141] 735 320 325 392 524 ...