0
votes

I got the error when I wanted to set the first column as the row names:

dt <- fread('../data/data_logTMP.csv', header = T)
rownames(dt) <- dt$GENE

I used duplicated() to check the values:

> which(duplicated(dt$GENE) == TRUE)
[1] 20209 21919

Therefore, I compared these values:

> dt$GENE[20209] == dt$GENE[21919]
[1] FALSE
> dt$GENE[20209]
[1] "1-Mar"
> dt$GENE[21919]
[1] "2-Mar"

Why were these two values recognized as duplicated? And how can I fix this problem?

1
you used fread so your dt is a data.table. data.tables don't use row names; what's your goal in trying to assign row names?MichaelChirico
about your issue, that's not how duplicated works. the output of duplicated tells you the indices of a duplicated element, but the first instance of that element is elsewhere in the vector. look at duplicated(c(1, 1, 2, 3))MichaelChirico
Didn't fread create a data frame which can have row names? I used is.data.frame(dt) and got a True. I tried the example duplicated(c(1, 1, 2, 3)), and I understood the first number cannot be recognized as duplicated. I've found true duplicated values. Thank you very much!icy
is.data.frame will be TRUE because data.table is an extension of data.frame (you can look at class(dt)). row names is one place where the extension is a bit muddied; data.table prefers to use keys instead of row names; see this vignetteMichaelChirico
Thank you for such a detailed explanation!icy

1 Answers

1
votes

As you are using fread for reading the file the default class of you object dt will be of data.table. By design data.table will not support row.names. Therefore you need to pass an additional argument to fread as shown below to make sure that the class of the object that you are reading is not a data.table.

data.table::fread(input = "file name",sep = ",",header = T,data.table = FALSE)