1
votes

I have a data frame that looks like this:

 GID7173723 GID4878677 GID88208 GID346403 GID268825 GID7399578
1           A          A        A         A         G          A
2           T          T        T         T         C          T
3           G          G        G         G         G          G
4           A          A        A         A         A          A
5           G          G        G         G         G          G
6           G          G        G         G         G          G
7           A          A        A         A         A          A
8           G          G        G         G         G          G
9           A          A        A         A         A          A
10          A          A        A         A         A          A

However, when I use the apply function to get the sum of all 'A' by row divided by the number of columns in the dataframe, I get the total sum of A's instead of getting row sums.

Here is the function I wrote:

myfun <- function(x){
 out <-  sum(x=='A')/ncol(x)
 return(out)
}
apply(df,MARGIN = 1,FUN=myfun)

I cannot figure out why the apply function gives me the total sum of A and not by row.

2

2 Answers

1
votes

We can use rowSums

rowSums(df1=="A")/ncol(df1)

Or use `rowMeans

rowMeans(df1 == "A")

With apply, the ncol doesn't apply as it is a vector, so we need length(x)

myfun <- function(x){
  sum(x=='A')/length(x)
  #or
  # mean(x == "A")

 }
0
votes

Solution with apply()

apply(df, 1,FUN=function(rowVec) table(rowVec)['A'])

table() gives counts of each of the bases - you select 'A' out of them.