7
votes

I'd like to apply a couple functions to a column but I want to apply some logic as to when I do this, in this case when another column has some NA's. To illustrate I'll add some NA to the iris dataset and turn it into a data.table:

library(data.table)

irisdt <- iris
## Prep some example data
irisdt[irisdt$Sepal.Length < 5,]$Sepal.Length <- NA
irisdt[irisdt$Sepal.Width < 3,]$Sepal.Width <- NA

## Turn this into a data.table
irisdt <- as.data.table(iris)

If I wanted to apply max to multiple columns I'd go like this:

## Apply a function to individual columns
irisdt[, lapply(.SD, max), .SDcols = c("Petal.Length", "Petal.Width")]
#>    Petal.Length Petal.Width
#> 1:          6.9         2.5

In this case however I'd like to take out any row that isn't an NA in Sepal.Length and then return max and min along with the name of the column I subset for NA's. Below is an ugly way of implementing this but hopefully illustrates what I am after:

## Here is what the table would look like
desired_table <- rbind(
  irisdt[!is.na(Sepal.Length), .(max = max(Petal.Length), min = min(Petal.Length), var = "Sepal.Length")],
  irisdt[!is.na(Sepal.Width), .(max = max(Petal.Length), min = min(Petal.Length), var = "Sepal.Width")]
)

desired_table
#>    max min          var
#> 1: 6.9 1.2 Sepal.Length
#> 2: 6.7 1.0  Sepal.Width

Created on 2020-01-14 by the reprex package (v0.3.0)

Any thoughts on how I might accomplish this?

1

1 Answers

7
votes

melt may be better option if we are comparing by multiple columns. Reshape into 'long' format, then use i with the condition !is.na(value), while grouping by 'variable' and get the min and max of the specified variable

library(data.table)
melt(irisdt,  measure = c('Sepal.Length', 'Sepal.Width'))[!is.na(value),
   .(max = max(Petal.Length), min = min(Petal.Length)), .(variable)]

If we are doing this for multiple variables, then use the lapply(.SD, ...