4
votes

I have many samples, each one of which has a corresponding abundance matrix. From these abundance matrices, I would like to create a large matrix that contains abundance information for each sample in rows.

For example, a single abundance matrix would look like:

           A  B  C  D 
sample1    1  3  4  2

where A, B, C, and D represent colnames, and the abundances are the row values.

I would like to populate my larger matrix, which has as colnames all possible letters (A:Z) and all possible samples (sample1:sampleN) as rows, by matching the colname values.

For ex. :

         A  B  C  D  E  F  G ....  Z
sample1  1  3  4  2  NA NA NA ....
sample2  NA NA 2  5  7  NA NA ....
sample3  4  NA 6  9  2  NA 2 .....
....
sampleN

Different samples have a varying mix of abundances, in no guaranteed order.

When iteratively adding to this larger matrix, how could I ensure that the correct columns are populated by the right abundance values (ex. column "A" is only filled by values corresponding to abundances of "A" in different samples)? Thanks!

3
Darth_Vedar, I'm not in a rush (or even presuming that mine is what you should accept), but you have not accepted any answer for your previous questions. If an answer addresses your question, please accept it; doing so not only provides a little perk to the answerer with some points, but also provides some closure for readers with similar questions. Though you can only accept one answer, you have the option to up-vote as many as you think are helpful. (If there are still issues, you will likely need to edit your question with further details.) - r2evans

3 Answers

2
votes

Starting data, changing just a little to highlight differences:

m1 <- as.matrix(read.table(header=TRUE, text="
           A  B  C  Z
sample1    1  3  4  2"))
m2 <- as.matrix(read.table(header=TRUE, text="
         A  B  C  D  E  F  G
sample2  NA NA 2  5  7  NA NA
sample3  4  NA 6  9  2  NA 2"))

First, we need to make sure both matrices have the same column names:

newcols <- setdiff(colnames(m2), colnames(m1))
m1 <- cbind(m1, matrix(NA, nr=nrow(m1), nc=length(newcols), dimnames=list(NULL, newcols)))
newcols <- setdiff(colnames(m1), colnames(m2))
m2 <- cbind(m2, matrix(NA, nr=nrow(m2), nc=length(newcols), dimnames=list(NULL, newcols)))

m1
#         A B C Z  D  E  F  G
# sample1 1 3 4 2 NA NA NA NA
m2
#          A  B C D E  F  G  Z
# sample2 NA NA 2 5 7 NA NA NA
# sample3  4 NA 6 9 2 NA  2 NA

And now we combine them; regular cbind needs the column names to be aligned as well:

rbind(m2, m1[,colnames(m2),drop=FALSE])
#          A  B C  D  E  F  G  Z
# sample2 NA NA 2  5  7 NA NA NA
# sample3  4 NA 6  9  2 NA  2 NA
# sample1  1  3 4 NA NA NA NA  2
2
votes

You should be able to take advantage of matrix indexing, like so:

big[cbind(rownames(abun),colnames(abun))] <- abun

Using this example abundance matrix, and a big matrix to fill:

abun <- matrix(c(1,3,4,2),nrow=1,dimnames=list("sample1",LETTERS[1:4]))
big <- matrix(NA,nrow=5,ncol=26,dimnames=list(paste0("sample",1:5),LETTERS))
1
votes

Another solution using reduce from purrr package and union_all from dplyr package:

library(purrr)
library(dplyr)

sample_names <- c("sample1","sample2","sample3")

Generating 3 random abundance dataframes:

num1 <- round(runif(runif(1,min = 1, max = 10),min = 1, max = 10))
df1 <- data.frame(t(num1))
colnames(df1) <- sample(LETTERS,length(num1))

num2 <- round(runif(runif(1,min = 1, max = 10),min = 1, max = 10))
df2 <- data.frame(t(num2))
colnames(df2) <- sample(LETTERS,length(num2))

num3 <- round(runif(runif(1,min = 1, max = 10),min = 1, max = 10))
df3 <- data.frame(t(num3))
colnames(df3) <- sample(LETTERS,length(num3))

This is actually the code that does all the magic:

A <- reduce(list(df1,df2,df3),union_all)
col_order <- sort(colnames(A),decreasing = FALSE)
A <- A[,col_order]
rownames(A) <- sample_names

Output:

> A
         A  C  E  F  O  P  Q  U  W  Y
sample1  9 NA NA NA  9 NA  5  6 NA NA
sample2 NA NA NA NA  5  4 NA NA  5 NA
sample3 NA  6  5  9 NA NA  3 NA  5  7