18
votes

Hi am using a matrix of gene expression, frag counts to calculate differentially expressed genes. I would like to know how to remove the rows which have values as 0. Then my data set will be compact and less spurious results will be given for the downstream analysis I do using this matrix.

Input

gene    ZPT.1   ZPT.0   ZPT.2   ZPT.3   PDGT.1  PDGT.0
XLOC_000001 3516    626 1277    770 4309    9030
XLOC_000002 342 82  185 72  835 1095
XLOC_000003 2000    361 867 438 454 687
XLOC_000004 143 30  67  37  90  236
XLOC_000005 0   0   0   0   0   0
XLOC_000006 0   0   0   0   0   0
XLOC_000007 0   0   0   0   1   3
XLOC_000008 0   0   0   0   0   0
XLOC_000009 0   0   0   0   0   0
XLOC_000010 7   1   5   3   0   1
XLOC_000011 63  10  19  15  92  228

Desired output

gene    ZPT.1   ZPT.0   ZPT.2   ZPT.3   PDGT.1  PDGT.0
XLOC_000001 3516    626 1277    770 4309    9030
XLOC_000002 342 82  185 72  835 1095
XLOC_000003 2000    361 867 438 454 687
XLOC_000004 143 30  67  37  90  236
XLOC_000007 0   0   0   0   1   3
XLOC_000010 7   1   5   3   0   1
XLOC_000011 63  10  19  15  92  228

As of now I only want to remove those rows where all the frag count columns are 0 if in any row some values are 0 and others are non zero I would like to keep that row intact as you can see my example above.

Please let me know how to do this.

2
df[rowSums(df[, -1])>0, ] - Arun
@Arun a minor nit: the OP didn't specify whether he's got an array of integers or floats, so to be careful, you might want to check that rowSums is greater than 1e-10 or something. - Carl Witthoft
@CarlWitthoft, I guess the bioinformatician reflux kicked in. These are read counts from gene expression data. They are discrete counts and therefore are likely to be integers (>= 0). - Arun

2 Answers

22
votes
df[apply(df[,-1], 1, function(x) !all(x==0)),]
1
votes

A lot of options to do this within the tidyverse have been posted here: How to remove rows where all columns are zero using dplyr pipe

my preferred option is using rowwise()

library(tidyverse)

df <- df %>% 
    rowwise() %>% 
    filter(sum(c(col1,col2,col3)) != 0)