1
votes

Using iris data as an example, there are three types of iris: setosa, versicolor and virginica. I want to normalize their sepal.length, respectively. I know a simple but tedious process. Whether there is a more simply way to attain my goal? My process:

    data(iris)
    library(dplyr)
    normalize <- function(x){
        return((x- mean(x))/(max(x)-min(x)))
    }
    data1 <- sapply(filter(iris, Species == 'setosa')[1:4], normalize)
    data2 <- sapply(filter(iris, Species == 'versicolor')[1:4], normalize)
    data3 <- sapply(filter(iris, Species == 'virginica')[1:4], normalize)
    Speiec <- rep(c('setosa','versicolor','virginica'), each = 50)
    thedata <- rbind(data1, data2,data3)
    theirisdata <- data.frame(thedata,Speiec)

The final data "theirisdata" has the same data structure, but the Sepal.length Sepal.width, Petal.length and Petal.width were normalized in each specie group. I need a more concise way to deal such problem. For example the rows of a data frame could be classified into 10 or more groups. For each group, a function was applied to each column.

1

1 Answers

1
votes

You can use group_by in dplyr to apply functions to each group individually, and then modify multiple columns in place with mutate_each

data(iris)
library(dplyr)
normalize <- function(x){
    return((x- mean(x))/(max(x)-min(x)))
}

my_data <- iris %>% group_by(Species) %>% 
    mutate_each(funs(normalize))

Check that it returns the same as your original answer:

all(my_data == theirisdata)
 [1] TRUE