2
votes

ORIGINAL QUESTION EDITED

Dear R Users,

I have a question about removing rows from a matrix. All matrix entries are either a 0 or a 1. The rows are sorted according to the row sum.

Here is an example matrix

e1  <- c(0,0,0,1,0,0,0)
e2  <- c(1,0,0,0,0,0,0)
e3  <- c(0,1,0,0,0,0,0)
e4  <- c(0,0,1,0,1,0,0)
e5  <- c(1,1,0,0,0,0,0)
e6  <- c(1,0,0,0,1,0,0)
e7 <- c(0,0,0,0,0,1,1)
e8 <- c(0,0,0,0,1,0,1)
e9  <- c(0,0,1,0,1,1,0)
e10  <- c(0,0,1,0,1,0,1)
e11 <- c(0,0,0,0,1,1,1)
e12  <- c(1,1,0,1,1,0,0)
e13 <- c(0,0,1,1,0,1,1)


(E <- rbind(e1, e2, e3, e4, e5, e6, e7, e8, e9, e10, e11, e12, e13))

Which prints

> (E <- rbind(e1, e2, e3, e4, e5, e6, e7, e8, e9, e10, e11, e12, e13))
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
e1     0    0    0    1    0    0    0
e2     1    0    0    0    0    0    0
e3     0    1    0    0    0    0    0
e4     0    0    1    0    1    0    0
e5     1    1    0    0    0    0    0
e6     1    0    0    0    1    0    0
e7     0    0    0    0    0    1    1
e8     0    0    0    0    1    0    1
e9     0    0    1    0    1    1    0
e10    0    0    1    0    1    0    1
e11    0    0    0    0    1    1    1
e12    1    1    0    1    1    0    0
e13    0    0    1    1    0    1    1

I want to remove rows in the following fashion. If a row has a single 1 then all following rows below that with a 1 in that column position should be removed. So we observe rows e1 e2 and e3 can successively remove rows e5, e6, e12 and e13. Leaving us with rows e1, e2, e3, e4, e7, e8, e9, e10 and e11.

for (v in 2:dim(E)[1])
{
print(v)
print(E[v, 4])
if (E[v, 4] == 1) E <- E[-v,]   
}

Removing rows inside a for-loop is giving me an error. So I thought I ll first find rows (if any) have a rowsum 1 and identified them. Then I try to remove the following rows with a 1 in that position using a for-loop. Once again an error.

UnitRowsum <- E[which(rowSums(E) == 1),]
UnitRowsum

for (v in 1:dim(UnitRowsum)[1])
{
print(which(UnitRowsum[v, ] == 1))
}

Furthermore I want to continue row removals based on rows with sum greater than one and removes all following rows that have a 1 in all those positions and so on. To example what I mean first I'll have a reduced matrix

[,1] [,2] [,3] [,4] [,5] [,6] [,7]
e1     0    0    0    1    0    0    0
e2     1    0    0    0    0    0    0
e3     0    1    0    0    0    0    0
e4     0    0    1    0    1    0    0
e7     0    0    0    0    0    1    1
e8     0    0    0    0    1    0    1
e9     0    0    1    0    1    1    0
e10    0    0    1    0    1    0    1
e11    0    0    0    0    1    1    1

First we start with rows that have rowsum 1 and we go delete all following rows with a 1 in that position. Once this is done we see if there are any rows with rowsum 2 left. Is yes, we go down searching for all rows that have 1s in both those positions and delete all such rows. After this we go to rows with rowsum 3 (if any) and go on like like until we arrive at a matrix where no row is dominant than other.

Row e4 dominates rows e9 and e10 and therefore have to be removed. Row e8 dominates row e11 and therefore have to be removed as well. This continues till no more rows can be removed. Eventually I want to get the matrix

[,1] [,2] [,3] [,4] [,5] [,6] [,7]
e1     0    0    0    1    0    0    0
e2     1    0    0    0    0    0    0
e3     0    1    0    0    0    0    0
e4     0    0    1    0    1    0    0
e7     0    0    0    0    0    1    1
e8     0    0    0    0    1    0    1

Could you please help me with this?

Sincerely, Ash

2

2 Answers

0
votes

Here is one solution I came up with that can work:

E[!row.names(E) %in% unique(unlist(apply(E, 2, function(x) names(x[x == 1][2:length(x[x == 1])])))), ]

You can test out each of these portions to see what is going on.

This gives the list of rows where 1's occur in each of the columns of the matrix:

apply(E, 2, function(x) names(x[x == 1]))

This gives the list of rows where 1's occur AFTER the first occurrence of the columns of the matrix:

apply(E, 2, function(x) names(x[x == 1][2:length(x[x == 1])]))

This gives the unique row numbers eligible for removal:

unique(unlist(apply(E, 2, function(x) names(x[x == 1][2:length(x[x == 1])]))))

Then the last step is to use the filtering / subsetting to remove these rows.

You can perhaps simplify the code by writing a small custom function to feed to apply and optimize a bit by not invoking E[x == 1] twice. But...hope this helps.

0
votes
#This seems to do the trick (Thanks Cody)

test = E
for (i in (nrow(E)-1):1)
 {
  if (sum(E[i,]) == 0)
  {
    test = test[-c(i),]
    next
   }
 for (j in (nrow(test): (i)))
 {
   if ((j>nrow(test)) | (j==i))
    {break}
   if (sum(xor(E[i,],test[j,])*1) == (sum(test[j,]) - sum(E[i,])))
   {
      print(c(i,j))
      test = test[-c(j),]
    }
  }
 }

 E = test

 E