1
votes

I have a big df (CSV format) that looks like:

miRNAs <- c('mmu_mir-1-3p','mmu_mir-1-5p','mmu-mir-6-5p','mmu-mir-6-3p')
cca <- c('12854','5489','54485','2563')
ccb <- c('124','589','5465','25893')
taa <- c('12854','589','5645','763')
df <- data.frame(miRNAs,cca,ccb,taa)

and I want to use this df in DESeq2 analyses. I made this df unique by using unique(df) and tried to open using countData <- as.matrix(read.csv(file="df.csv", row.name="miRNAs", sep = ",")) but it gives this error

Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed

Since I made the df unique I don't know why this error keeps popping up. Basically why I want to read my df in that way is that I want to get the list of my column headers (except the first column) when I type colnames(df). Because I need to do FALSE TRUE test to see if match these are matching with row names of another file called phenotype.csv all(rownames(phenotype) == colnames(countData))

1
if you have to have duplicated entries in the first column of df, it will give you this kind of error. Can you check table(duplicated(df$miRNAs))StupidWolf
I have 1978 FALSE and 44 TRUEApex
Yeah, so in your df, there are actually duplicated miRNAs (based on column miRNA) and they have different counts, which makes them non-duplicated.StupidWolf
I used this to remove the first column duplicates new_df <- df[!duplicated(df$miRNAs),,drop=FALSE] is this correct?Apex
yes, this is ok, then write.csv(new_df,"new_df.csv",row.names=FALSE)StupidWolf

1 Answers

1
votes

In the row.name="miRNAs" argument you are not accessing the respective column, but are using a length one character vector. That then gets recycled and that's why you get the error. Import without the row.names argument and if you really want that variable as row names instead of a column, then do that after the import:

df <- data.frame(
  miRNAs = c('mmu_mir-1-3p','mmu_mir-1-5p','mmu-mir-6-5p','mmu-mir-6-3p'),
  cca = c('12854','5489','54485','2563'),
  ccb = c('124','589','5465','25893'),
  taa = c('12854','589','5645','763')
  )

rownames(df) <- df$miRNAs
df$miRNAs <- NULL
df
#>                cca   ccb   taa
#> mmu_mir-1-3p 12854   124 12854
#> mmu_mir-1-5p  5489   589   589
#> mmu-mir-6-5p 54485  5465  5645
#> mmu-mir-6-3p  2563 25893   763

Created on 2020-02-19 by the reprex package (v0.3.0)