2
votes

Applying labels is an important part of making survey data comprehensible when reported

So the best example I can find uses expss::apply_labels() e.g the famous mtcars example https://cran.r-project.org/web/packages/expss/vignettes/tables-with-labels.html

as input this requires a data.table and a list of comma separated assignment pairs e.g

apply_labels(dt, col1 = "label1", col2 = "label2", col3 = "label3")

This is fine if you have one data file and a few columns and you can be bothered typing them in for each time, but its not very helpful if you have lots of data files. So how could one load a csv metadata file in format:

Col1 Col2 Col3

Label1 Label2 Label3

where the Col names match the same names in the data table

this means effectively translating the metadata csv file so that it generates

coln = "labeln"

for each column.

So far I have found the biggest problem is that the apply labels column names are objects not strings and it is very difficult to translate a string to the object in the right scope.

This is where I've got to

    library(expss)
    library(data.table)
    library(glue)

    readcsvdata <- function(dfile)
     {
        rdata <- fread(file = dfile, sep = "," , quote = "\"" , header = TRUE, 
        stringsAsFactors = FALSE, na.strings = getOption("datatable.na.strings","NA"))
        return(rdata)
        }

    rawdatafilename <- "testdata.csv"
    rawmetadata <- "metadata.csv"

    mdt <- readcsvdata(rawmetadata)
    rdt <-readcsvdata(rawdatafilename)
    commonnames <- intersect(names(mdt),names(rdt))  # find common 
    qlabels <- as.character(mdt[1, commonnames, with = FALSE])

    comslist <- list()
    for (i in 1:length(commonnames)) # loop through commonnames and qlabels
          {  
          if (i == length(commonnames))
              {x <- glue('{commonnames[i]} = "{qlabels[i]}"')} # no comma for final item
              else 
              {x <- glue('{commonnames[i]} = "{qlabels[i]}",')} # comma for next item

          comslist[[i]] <- x
    }

comstring <- paste(unlist(comslist), collapse = '')

tdt = apply_labels(tdt, eval(parse(text = comstring)))

which yields

Error in parse(text = comstring) : :1:24: unexpected ',' 1: varone = "Label1", ^

oh and print(comstring) produces:

[1] "varone = \"Question one\",vartwo = \"Question two\",varthree = \"Question three\",varfour = \"Question four\",varfive = \"Question five\",varsix = \"Question six\",varseven = \"Question seven\",vareight = \"Question eight\",varnine = \"Question nine\",varten = \"Question ten\""

2
If that's truly a CSV file, and you read that in with read.csv (or fread or whatever), then do.call(apply_labels, c(list(dt), csvdat)) should work. - r2evans
You can use var_lab in a loop: for(each in colnames(metadata)) var_lab(dt[[each]]) = metadata[[each]] - Gregory Demin

2 Answers

1
votes

apply_labels is not very convenient for assignment labels from external dictionary. You can use var_lab instead:

library(expss)
library(data.table)

readcsvdata <- function(dfile)
{
    rdata <- fread(file = dfile, sep = "," , quote = "\"" , header = TRUE, 
                   stringsAsFactors = FALSE, na.strings = getOption("datatable.na.strings","NA"))
    return(rdata)
}

rawdatafilename <- "testdata.csv"
rawmetadata <- "metadata.csv"

mdt <- readcsvdata(rawmetadata)
rdt <-readcsvdata(rawdatafilename)
commonnames <- intersect(names(mdt),names(rdt))  # find common 
qlabels <- as.list(mdt[1, commonnames, with = FALSE])


for (each_name in commonnames) # loop through commonnames and qlabels
{  
    var_lab(rdt[[each_name]]) <- qlabels[[each_name]]
}

There is a similar val_lab function for value labels. Additionally you may be interested in apply_dictionary and create_dictionary functions. To get help about them type ?apply_dictionary in the console.

1
votes

I don't have expss handy, but I think this is generically about how to programmatically assign function arguments in R.

If you start with a CSV file that contains the three pairings you need,

csvdat <- read.csv(stringsAsFactors=FALSE, text="
col1,col2,col3
label1,label2,label3")

I'll write a fake function (since I don't have expss, and it's not critical) that takes a first argument and zero or more follow-on arguments dynamically.

my_fake_labels <- function(x, ...) {
  dots <- list(...)
  message("x labels   : ", paste(sQuote(colnames(x)), collapse = ", "))
  message("other names: ", paste(sQuote(names(dots)), collapse = ", "))
}
origDT <- data.table(aa=1, bb=2)

my_fake_labels(origDT, col1="label1", col2="label2", col3="label3")
# x labels   : 'aa', 'bb'
# other names: 'col1', 'col2', 'col3'

It's that manual argument-setting that you're trying to avoid. (I know I'm not doing any label-setting here, let's ignore that for now.)

The programmatic way of doing this, using origDT as the first argument, and the elements of csvdat as the second and subsequent arguments:

do.call(my_fake_labels, c(list(origDT), csvdat))
# x labels   : 'aa', 'bb'
# other names: 'col1', 'col2', 'col3'

The second argument to do.call needs to be a list, optionally named. Since a data.frame (and therefore a data.table) is just a glorified named list, this fits the bill. What this does is take each element of the list and apply it as the corresponding arguments of the function (the first argument of do.call).

The list(origDT) is because normally the c(...) function would concatenate the columns/elements of the two lists. If we did just c(origDT, csvdat), then the function would be called with ncol(origDT) + ncol(csvdat) arguments, instead of the desired 1 + ncol(csvdat). For this, c(list(origDT), ...) makes sure that the whole origDT is the function's first argument.

(It might also be easy to form the csvdat programmatically instead of requiring an external file, but I'm guessing that you have a reason to do it via CSV.)