1
votes

I have a data frame (imported from an Excel worksheet where I have written a lists of strings row by row) and want to convert the rows into a list of vectors where each vector contains the non-missing cell values for that row:

eg:

#Sample data frame
dfX <- data.frame(C0 = c(1,2,3),
              C1 = c("Apple","Apple","Pear"),
              C2 = c("Banana","Orange", "Lemon"),
              C3 = c("Pear","Melon", ""))

Which would be used to generate the following list:

myList = list(c("Apple","Banana", "Pear"),
          c("Apple","Orange", "Melon"),
          c("Pear","Lemon"))

Note the third vector is truncated to two elements as the cell contains an empty string. Also note that the index (C0) is dropped.

I have seen some examples which convert the data frame to a matrix and use the split function to then paste the results into the global environment, e.g.

list2env(setNames(split(as.matrix(dfX),
                    row(dfX)), paste0("Row",1:3)),
                    envir=.GlobalEnv)

But I was wondering if there were (a) a newer tidyverse function for handling this and (b) a way to populate straight to a list (I later want to lapply a function against that list). Also want the missing values handling on the way into the list if possible!

2
data.frame is effectively a list of vectors. So, class(dfX) <- "list"Khashaa
Or as.list(dfX). Either way, it would be probably make your life easier to convert the empty strings into NA_character_ first. If you're reading your data in from csv, note the handy argument na.strings to do this during the import.DanY
@Khashaa That does not reproduce OPs expected output; OP is after a row-wise (not column-wise) operation. Your solution turns a data.frame into a list of column vectors. OP wants a list of row vectors.Maurits Evers
@MauritsEvers Didn't notice the rows were transposed. gather(dfX, var, val, -C0) %>% spread(C0, val) %>% map(c) `Khashaa

2 Answers

4
votes

As you are interested in tidyverse way, one option would be

library(tidyverse)

dfX %>%
  group_split(C0) %>% #Or use split(.$C0) if `dplyr` is not updated
  map(~discard(flatten_chr(.), . == "")[-1])

#[[1]]
#[1] "Apple"  "Banana" "Pear"  

#[[2]]
#[1] "Apple"  "Orange" "Melon" 

#[[3]]
#[1] "Pear"  "Lemon"

group_split is available in dplyr 0.8.0. Also this assumes that you would have unique C0 in every row and for every row we discard any value which is equal to empty strings ("").


Or in base R combination of split and lapply would also work.

lapply(split(dfX[-1], dfX$C0), function(x) x[x != ""])

#$`1`
#[1] "Apple"  "Banana" "Pear"  

#$`2`
#[1] "Apple"  "Orange" "Melon" 

#$`3`
#[1] "Pear"  "Lemon"

Another base R option is apply with MARGIN = 1

apply(dfX[-1], 1, function(x) x[x!= ""])
1
votes

A base R option is by

by(dfX, dfX$C0, function(x) unlist(x[x != ''][-1]))
#dfX$C0: 1
#[1] "Apple"  "Banana" "Pear"
#------------------------------------------------------------
#dfX$C0: 2
#[1] "Apple"  "Orange" "Melon"
#------------------------------------------------------------
#dfX$C0: 3
#[1] "Pear"  "Lemon"

by returns a "dressed" list, ignoring the attributes this is the same as your expected myList.