0
votes

I am attempting to apply a custom function that calls components of that dataframe to do a calculation. I have made a trivial example below because my actual problem is very hard to make a reproducible example. In the below example I want to have the first two columns be added together to create a third column which is the sum of them. Below is an example I found online that gets close to what I want:

celebrities=data.frame(name=c("Andrew","matt","Dany","Philip","John","bing","Monica"),
                       age=c(28,23,49,29,38,23,29),
                       income=c(25.2,10.5,11,21.9,44,11.5,45))
f=function(x,output){
  name=x[1]
  income=x[3]
  cat(name,income,"\n")
}
apply(celebrities,1,f)

But when I try to take it and apply mathematical function it doesn't work:

  f2=function(x,output){
  age=x[2]
  income=x[3]
  sum(age,income)
}
apply(celebrities,1,f2)

In essence what I need is for apply to take a dataset, go through every row of that dataset using the values in that row as inputs into the function and add a third column to the dataset with the results of the function. Please let me know how I can clarify this question if needed. I have referred to the questions below, but they don't seem to work for me.

Apply a function to every row of a matrix or a data frame

How to assign new values from lapply to new column in dataframes in list

Call apply-like function on each row of dataframe with multiple arguments from each row

3
When you use apply on a data.frame, it is converted to a matrix for the processing. If any of the columns (of the processed frame) are character, the all columns are converted to character, defeating any math operations. Though I tend to discourage apply with frames, if you must then make sure that you only use a portion of it, something like apply(celebrities[c("age","income")], 1, sum). - r2evans
You could try using something from library(plyr) such as adply or aaply (depending on what you want the output format to be like) which don't coerce all columns to character - Sarah
I believe dplyr now has a rowwise function that can help you do what you're looking for. E.g., library(dplyr) ; celebrities %>% rowwise %>% mutate(new_var = f(var1, var2)) - Jake Fisher

3 Answers

2
votes

For the particular task requested it could be

celebrities$newcol <- with(celebrities, age + income)

The + function is inherently vectorized. Using apply with sum is inefficient. Using apply could have been greatly simplified by omitting the first column because that would avoid the coercion to a character matrix caused by the first column.

 celebrities$newcol <- apply(celebrities[-1], function(x) sum(x) )

That way you would avoid coercing the vectors to "character" and then needing to coerce back the formerly-numeric columns to numeric. Using sum inside apply does get around the fact that sum is not vectorized, but it's an example of inefficient R coding.

You get automatic vectorization if the "inner" algorithm can be constructed completely from vectorized functions: the Math and Ops groups being the usual components. See ?Ops. Otherwise, you may need to use mapply or Vectorize.

1
votes

Taking hints from @r2evans and @user2738526 I have made the modification to your function. Explicitly convert numbers to numeric. The below code snippet works for your case:

f2=function(x,output){
  age=as.numeric(x[2])
  income=as.numeric(x[3])
  sum(age,income)
}
apply(celebrities,1,f2)

[1] 53.2 33.5 60.0 50.9 82.0 34.5 74.0
1
votes

Give this a try:

library(dplyr)
celebrities=data.frame(name=c("Andrew","matt","Dany","Philip","John","bing","Monica"),
                       age=c(28,23,49,29,38,23,29),
                       income=c(25.2,10.5,11,21.9,44,11.5,45)) 

celebrities %>% 
  rowwise %>% 
  mutate(age_plus_income = sum(age, income))

(Obviously, for summing two columns, you'd be better off using mutate(celebrities, age_plus_income = age + income), but I assume your real example uses a more complicated function.)