0
votes

I'd like to know how to collapse the columns in a dataframe with the row with the least number of columns without NAs. Some rows only 3 columns, and the rest of filled with NA. Other rows have all columns filled with numbers and no NAs. I have provided a sample dataset below and what I'd like to be the ideal outcome.

In the ideal data frame below, the middle column always becomes column 2. For instance, a row with 5 columns collapses columns 1 & 2 into 1, and 4 & 5 into 3.

a <- c(1, 1, 1, 1, 1, 1)
b <- c(2, 2, 2, 2, 2, 2)
c <- c(3, 3, 3, 3, 3, 3)
d <- c(NA, 4, NA, 4, 4, NA)
e <- c(NA, 5, NA, 5, 5, NA)
f <- c(NA, NA, NA, 6, 6, NA)
g <- c(NA, NA, NA, 7, 7, NA)

df <- data.frame(a, b, c, d, e, f, g)

ideal data.drame
ai <- c(1, 3, 1, 1, 5, 1)
bi <- c(2, 3, 2, 2, 4, 2)
ci <- c(3, 9, 3, 3, 18, 3)
di <- c(NA, NA, NA, NA, NA, NA)
ei <- c(NA, NA, NA, NA, NA, NA)
fi <- c(NA, NA, NA, NA, NA, NA)
gi <- c(NA, NA, NA, NA, NA, NA)

dfi <- data.frame(ai, bi, ci, di, ei, fi, gi)

The reason is that I have dataset on politics and demographics that I need to clean up for analysis. Each row represents poll respondents a scale of conservative to liberal. Some polls only have 3 categories of liberal/conservative, some have 5 or 7 categories. The only way it makes sense to me is to collapse every row down to row with least number of columns (in my sample dataframe, this is 3). The actual dataset has thousands of rows, so I need an automated way to do it.

The dataframe is not sorted in any particular order. I've thought about cutting columns into percentiles, but that would wrongly interpret a row with 7 columns, since only the middle column (4) should be row 2.

Please let me know if i can make the question any easier to answer and thank you for the help.

1
Shouldn't ai be c(1, 3, 1, 1, 6, 1)?zw324
How do you collapse the row 5 in your df also the row 4? In df, both rows are similar, but in dfi it is differentakrun
Oops, shouldn't ai be c(1, 3, 1, 6, 6, 1), and bi c(2, 3, 2, 4, 4, 2), and ci c(3, 9, 3, 18, 18, 3)?zw324
@ZiyaoWei op, you're right! my mistaketom

1 Answers

1
votes

This is clunky, but probably will work:

> apply(df, 1, function(x) {y <- na.omit(x); m <- length(y) %/% 2; c(sum(y[1:m]), y[m + 1], sum(y[(m + 2):length(y)]))})
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    3    1    6    6    1
[2,]    2    3    2    4    4    2
[3,]    3    9    3   18   18    3