Problem generating row names for a read counts matrix in R

Question

I am following this tutorial online for analyzing RNA-seq data between cell types.

https://combine-australia.github.io/RNAseq-R/06-rnaseq-day1.html

I have been able to perform most of this using my own data, but I am now trying to perform pathway enrichment analysis. However, I am having issues because I am unable to label the rows of my initial readcounts matrix accounting to the Gene IDs.

I have tried to simply create a new column with the Gene IDs, however this changes the matrix to a dataframe and prevents me from using DGEList.

seqdata is my data.frame with all the information on the genes from the analysis, with column 1 as the gene ID names and columns 15 to 24 as the vectors with the read count information of each gene across 10 samples.

I generated a matrix from this data.frame called readcounts_g that just has the read counts for each of these genes, but I am trying to assign row names in which i take column 1 from seqdata and use the gene names in this vector to assign the rownames for readcounts_g dataframe.

rownames(readcounts_g) <- seqdata[,1]
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
In addition: Warning message:
Setting row names on a tibble is deprecated.

I also have thought to simply enter the gene names as an additional vector into readcounts_g, but if i do that they I cannot use DEGList because it requires a matrix.

Ultimately, I am trying to use goana to do an enrichment pathway analysis with differentially expressed genes. But, I am unable to do this without having gene names assigned to the final matrix of DEGs.

If anyone has insight on how I can remedy this, it would be greatly appreciated. I can try to explain further if need be.

StupidWolf StupidWolf · Accepted Answer · 2020-01-03T04:29:16

If seqdata is a tibble, seqdata[,1]is of class tibble and not character or numeric, hence you are unable to assign it as rownames of a matrix, see below for the alternative:

library(dplyr)

seqdata = tibble(geneID=sample(1:1000),
s1=rpois(1000,10),s2=rpois(1000,15),
s3=rpois(1000,20),s4=rpois(1000,25))

readcounts_g = as.matrix(seqdata[,2:5])
rownames(readcounts_g) = seqdata[,1]
#throws error
rownames(readcounts_g) = seqdata$geneID
#ok
> head(readcounts_g)
    s1 s2 s3 s4
763 16 13 13 24
776 13 19 24 26
308 12 19 19 34
88  10  8 13 22
23  10 13 16 25
509  9 12 14 28

Problem generating row names for a read counts matrix in R

1 Answers