0
votes

If I have a dataframe as follows, with a combination of text values and NA cells:

id Col1 Col2 Col3 Col4 Col5 Col...
id1 NA NA sample NA weight etc
id2 NA size NA NA NA etc
id3 volume size sample NA NA etc
id4 NA NA NA qty NA etc
id5 NA NA sample qty weight etc

Is it possible to rename the header with the most frequently used value in that column, as follows?

id volume size sample qty weight
id1 NA NA sample NA weight
id2 NA size NA NA NA
id3 volume size sample NA NA
id4 NA NA NA qty NA
id5 NA NA sample qty weight
2

2 Answers

2
votes

Try the code below

> cbind(df[1], setNames(df[-1], sapply(df[-1], function(x) unique(na.omit(x)))))
   id volume size sample  qty weight
1 id1   <NA> <NA> sample <NA> weight
2 id2   <NA> size   <NA> <NA>   <NA>
3 id3 volume size sample <NA>   <NA>
4 id4   <NA> <NA>   <NA>  qty   <NA>
5 id5   <NA> <NA> sample  qty weight

Data

> dput(df)
structure(list(id = c("id1", "id2", "id3", "id4", "id5"), Col1 = c(NA,
NA, "volume", NA, NA), Col2 = c(NA, "size", "size", NA, NA),
    Col3 = c("sample", NA, "sample", NA, "sample"), Col4 = c(NA,
    NA, NA, "qty", "qty"), Col5 = c("weight", NA, NA, NA, "weight"
    )), class = "data.frame", row.names = c(NA, -5L))
1
votes

You can get the most frequent value in each column using Mode function from here.

Mode <- function(x) {
  ux <- unique(na.omit(x))
  ux[which.max(tabulate(match(x, ux)))]
}

Apply it for each column and change the column name.

names(df)[-1] <- sapply(df[-1], Mode)
df

#   id volume size sample  qty weight
#1 id1   <NA> <NA> sample <NA> weight
#2 id2   <NA> size   <NA> <NA>   <NA>
#3 id3 volume size sample <NA>   <NA>
#4 id4   <NA> <NA>   <NA>  qty   <NA>
#5 id5   <NA> <NA> sample  qty weight