
I have the following matrix "m" (nrow=2504, ncol=2) with two columns; one called ind (from index) and the other called headerline (IDs of samples):

> head(m)
     ind headerline
[1,] "1" "HG00096" 
[2,] "2" "HG00097" 
[3,] "3" "HG00099" 
[4,] "4" "HG00100" 
[5,] "5" "HG00101" 
[6,] "6" "HG00102" ...

And the following index vector called "index" (nr=385, nc=1):

> head(index)
1  1
2  4
3  9
4 12
5 13
6 16 ...

I want to subset the samples in the row positions marked by index (I want a new matrix with sample in row 1, sample in row 4, sample in row 9 and so forth). I came up with the following code:

 for i in index { dudosos<-subset(headerline,ind==i, select=c(headerline)) }

but it yields the following error:

Error: unexpected symbol in "for i"

I don't know what that error is telling me, it's too vague. Help? Thanks!

Desired output:

> head(m)               #or other name
         ind headerline
         "1" "HG00096"   
         "4" "HG00100" 
         "9" ...

2 Answers


You can do this all in base:

m <- matrix(c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
              "HG00096", "HG00097", "HG00098", "HG00099", "HG00100", "HG00101","HG00102", "HG00103", "HG00103"), ncol=2)
index <- c("1", "4", "9") 

m[m[, 1] %in% index, ]

This or @Gin_Salmon's answer are the best way to achieve your goals...

This is an explanation of why your code was not working:

There are a few problems with your code:
1. Your for loop interation needs to be in (): for (i in index){ ... }
2. your subset command should read: subset(as.data.frame(m), ind == i, select = headerline)
3. Your loop overwrites dudosos at each iteration
dudosos[i, ] <- subset(m, ind == i, select = headerline)

m <- matrix(c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
              "HG00096", "HG00097", "HG00098", "HG00099", "HG00100", "HG00101","HG00102", "HG00103", "HG00103"), ncol=2)
index <- data.frame(V1= c("1", "4", "9"))
colnames(m) <- c("ind","headerline")
dudosos <- data.frame()
for (i in index$V1) { 
    dudosos <- rbind(dudosos, subset(x = as.data.frame(m) , 
                         subset = ind == i, select=headerline)) 

again the other solutions are much better, but sometimes it also helps to see why the code you originally wrote was not working.


Without being given an example of what you'd like to be returned, i.e I'm having a guess at what you're after. I'd say you'd be interested in looking at the %in% operator without the need for a for loop.

Using your example data:


m <- data.table(id = c("1", "2", "3", "4", "5", "6", "7", "8", "9"), headerline = c("HG00096", "HG00097", "HG00099", "HG00100", "HG00101", "HG00102","HG00103", "HG00104", "HG00105"))

index <- c("1", "4", "9")

output <- m[id %in% index,]

Where output looks as follows:

> output
   id headerline
1:  1    HG00096
2:  4    HG00100
3:  9    HG00103

So we've returned a new data table output, which contains the rows which are common to both the id column in m and the index vector.

Is this what you were after?