0
votes

I have the following matrix "m" (nrow=2504, ncol=2) with two columns; one called ind (from index) and the other called headerline (IDs of samples):

> head(m)
     ind headerline
[1,] "1" "HG00096" 
[2,] "2" "HG00097" 
[3,] "3" "HG00099" 
[4,] "4" "HG00100" 
[5,] "5" "HG00101" 
[6,] "6" "HG00102" ...

And the following index vector called "index" (nr=385, nc=1):

> head(index)
  V1
1  1
2  4
3  9
4 12
5 13
6 16 ...

I want to subset the samples in the row positions marked by index (I want a new matrix with sample in row 1, sample in row 4, sample in row 9 and so forth). I came up with the following code:

 for i in index { dudosos<-subset(headerline,ind==i, select=c(headerline)) }

but it yields the following error:

Error: unexpected symbol in "for i"

I don't know what that error is telling me, it's too vague. Help? Thanks!

Desired output:

> head(m)               #or other name
         ind headerline
         "1" "HG00096"   
         "4" "HG00100" 
         "9" ...
2

2 Answers

1
votes

You can do this all in base:

m <- matrix(c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
              "HG00096", "HG00097", "HG00098", "HG00099", "HG00100", "HG00101","HG00102", "HG00103", "HG00103"), ncol=2)
index <- c("1", "4", "9") 

m[m[, 1] %in% index, ]

This or @Gin_Salmon's answer are the best way to achieve your goals...

This is an explanation of why your code was not working:

There are a few problems with your code:
1. Your for loop interation needs to be in (): for (i in index){ ... }
2. your subset command should read: subset(as.data.frame(m), ind == i, select = headerline)
3. Your loop overwrites dudosos at each iteration
dudosos[i, ] <- subset(m, ind == i, select = headerline)

m <- matrix(c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
              "HG00096", "HG00097", "HG00098", "HG00099", "HG00100", "HG00101","HG00102", "HG00103", "HG00103"), ncol=2)
index <- data.frame(V1= c("1", "4", "9"))
colnames(m) <- c("ind","headerline")
dudosos <- data.frame()
for (i in index$V1) { 
    dudosos <- rbind(dudosos, subset(x = as.data.frame(m) , 
                         subset = ind == i, select=headerline)) 
 }

again the other solutions are much better, but sometimes it also helps to see why the code you originally wrote was not working.

2
votes

Without being given an example of what you'd like to be returned, i.e I'm having a guess at what you're after. I'd say you'd be interested in looking at the %in% operator without the need for a for loop.

Using your example data:

library(data.table)

m <- data.table(id = c("1", "2", "3", "4", "5", "6", "7", "8", "9"), headerline = c("HG00096", "HG00097", "HG00099", "HG00100", "HG00101", "HG00102","HG00103", "HG00104", "HG00105"))

index <- c("1", "4", "9")

output <- m[id %in% index,]

Where output looks as follows:

> output
   id headerline
1:  1    HG00096
2:  4    HG00100
3:  9    HG00103

So we've returned a new data table output, which contains the rows which are common to both the id column in m and the index vector.

Is this what you were after?