I have a data frame consisting of questionnaire data, with each column representing one item on the questionnaire.
The data looks something like the following:
df <- data.frame(Q1a = c(3, 2, 5, 6, 9), Q1b = c(2, 0, -2, 0, 9), Q2a = c(1, 4, 7, 2, 4),
Q2b = c(0, 0, -1, 0, 0), Q3a = c(5, 7, 2, 0, 9), Q3b = c(-2, -2, 3, 6, 9),
Q4a = c(5, 2, 4, 9, 0), Q4b = c(0, 0, -2, -2, -2))
Each questionnaire item has two versions (a, b). I want to select all the items that possess a "b" suffix. For those columns with a "b" suffix, I want to divide the number of cells with a value of -2 against the total number of non-blank and non-NA in that particular column. I want to repeat the above procedure for all columns. I managed to accomplish it with the following code:
test <- df %>%
select(ends_with("b")) %>%
mutate_all(funs(round(sum(. == -2)/sum(. != "" | . != NA)*100,
digits = 2)))
As there is no "group_by" equivalent that I know of that works on columns instead of rows, the same results are repeated in every row of the above output. I have managed to remove the rows with the repeated information with slice, using the following code:
test <- df %>%
select(ends_with("b")) %>%
mutate_all(funs(round(sum(. == -2)/sum(. != "" | . != NA)*100,
digits = 2))) %>%
slice(1)
With the above output, I want to continue my dplyr pipe by replacing the largest value in that row with the value of 1, and all other values as a percentage of the largest value.
What I have is the following:
The output I want is:
My two questions are:
1). Is there a group_by equivalent that works on columns? If so, I wouldn't have to use slice in such a clumsy manner.
2). Could someone help me complete my dplyr pipe to the output that I want? I am not sure how to proceed from there.
Thanks!


df1, you basically needdf1/apply(df1, 1, max)- Ronak Shah