I want to calculate the sd for several columns inside a data frame without leaving my dplyr pipe. In the past, I have done this by defaulting to base r. I haven't been able to find a solution here that works.
It may help to provide some context. This is a process I do to validate survey data. We measure the sd of matrix questions to identify straight-liners. An sd of zero across the columns flags a straight line. In the past, I calculated this in base R as follows:
apply(x, 1, sd)
I know there has to be a way to do this within a dplyr pipe. I've tried several options including pmap and various approaches at mutate_at. Here's my latest attempt:
library(tidyverse)
set.seed(858465)
scale_points <- c(1:5)
q1 <- sample(scale_points, replace = TRUE, size = 100)
q2 <- sample(scale_points, replace = TRUE, size = 100)
q3 <- sample(scale_points, replace = TRUE, size = 100)
digits = 0:9
createRandString<- function() {
v = c(sample(LETTERS, 5, replace = TRUE),
sample(digits, 4, replace = TRUE),
sample(LETTERS, 1, replace = TRUE))
return(paste0(v,collapse = ""))
}
s_data <- tibble::tibble(resp_id = 100)
for(i in c(1:100)) {
s_data[i,1] <- createRandString()
}
s_data <- bind_cols(s_data, q1 = q1, q2 = q2, q3 = q3)
s_data %>% mutate(vars(starts_with("q"), ~sd(.)))
In a perfect world, I would keep the resp_id variable in the output so that I could generate a report using filter to identify the respondent IDs with sd == 0.
Any help is greatly appreciated!