0
votes

The 18 cell lines are divided into two groups-triple and Pos. The genes are listed as columns and the cell lines are rows. I have already generated a data frame which has wilcoxon test pvalues, median difference and fold change between Triple and Pos. I need a column which tells me the number of "Triple"cell lines a gene is >0. That is, it should tell me how many times a particular gene is >0 in a "Triple" cell line. Here is a represenative data. How can I do this in R?

    Subtype A1BG    A1CF    A2LD1   A2M A2ML1   A3GALT2 A4GALT  A4GNT
MCF7    Pos 0   0   0   22.8    0   0   0   0
MDA_231 Triple  0   0   0   0   0   0   0   0
SKBR3   Pos 0   0   0   1.69    1.69    0   0   0
HCC1954 Pos 0   0   0   0   0   0   0   0
HCC1143 Triple  0   0   0   1.45    0   0   0   0
BT474   Pos 0   0   0   1.9 0   0   0   0
HCC1500 Pos 0   0   0   0   0   0   0   0
T47D    Pos 0   0   0   1.32    0   0   0   0
ZR75-1  Pos 0   0   0   0   0   0   0   0
HCC1937 Triple  0   0   0.79    33.76   0   0   0   0
HCC1599 Triple  0   0   0   0   0   0   0   0
HCC202  Pos 0   0   0.9 5.43    0   0   0   0
HCC1806 Triple  0   0   0   0   0   0   0   0
MDA-468 Triple  0   0   1.02    3.41    0   0   0   0
HCC2218 Pos 0   0   2.08    1.39    0   0   0   0
HCC70   Triple  0   0   0   3.67    29.76   0   0   0
HCC1187 Triple  0.7 0   1.75    4.21    0   0   0   0
Hs578T  Triple  0   0   0.84    1.26    0   0   0   0
BT549   Triple  0   0   0.64    0.64    0   0   0   0
1

1 Answers

0
votes

The formatting in your original post is a bit odd, but I think something like:

df$gt0 <- apply(df[-1]>0, 1, sum)

will compare every entry, other than that in the first column, to zero. It will then add up the number of times that is true, and for each row, append that as column gt0. It will calculate this for all rows regardless of subtype: if you only want to do the calculation for subtype "triple", then df <- subset(df, Subtype=="Triple") will reduce the dataset to the relevant rows.

Though when you say "particular gene", it makes me wonder if you need to a row-wise summary:

apply(df[df$subType=="Triple",-1]>0, 2, sum)

?