How to load df with 1000 separator in R as numeric class?

Question

I have a UTF-16 Unicode Text (.txt) file downloaded and defaulted as comma-separated values (.csv) when saved on a mac drive. This file contains numeric data that has 1000 separator applied for numbers that are greater than 1,000. When loading in R, this data is in character class. In order to convert to numeric class, I do the following:

tx <- read.table("/Users/username/Desktop/report.csv",sep="\t", dec = ".", fileEncoding = "UTF-16LE", fill = T, skip=1 , quote="", header=T, stringsAsFactors = FALSE)

tx$Cost <- gsub("\\,", replacement = "", x = tx$Cost)

tx$Cost <- as.numeric(tx$Cost)
Warning message:
NAs introduced by coercion

When summarized using the head(subset()) function, the following is the result where I'm still not able to convert into numeric class:

       **Orig after_gsub as.numeric**
1      95.31      95.31      95.31
2     992.77     992.77     992.77
3 "1,719.68"  "1719.68"         NA
4 "3,135.79"  "3135.79"         NA
5     111.91     111.91     111.91
6     305.12     305.12     305.12

Can someone help me convert them into numeric class using as.numeric()?

I do not get any error when I take the same steps after defining: tx=data.frame(Cost=c('100','100.1','1,999.5',NA)) — Naisheel Verdhan
@BondedDust: I'm not exactly sure how to apply colClasses in my situation, could you give me a hint specific to this situation? I can test and provide feedback. Thanks all! — ed0reddie
@beginneR: the combined as.numeric(gsub()) function doesn't seem to fix. :( — ed0reddie
@eN-V: if you load a csv with 1000 separator formatting applied to a column, you might be able to duplicate the scenario; instead of assigning a df with these values on GUI. Hopefully this helps you duplicate! — ed0reddie
Why not put up these data, or a sample of it and let us try it rather than guess! — Brandon Bertelsen

IRTFM IRTFM · Accepted Answer · 2014-08-02T00:46:26

The worked example using setClass, setAs and colClasses:

 library(methods)
  setClass("chr.w.commas", contains=numeric())
  setAs("character", "chr.w.commas", function(from) 
                              as.numeric(gsub("\\,", "",from )) )
 dat <- read.table(text="Orig after_gsub num
 1      '95.31'      '95.31'      '95.31'
 2     992.77     992.77     992.77
 3 '1,719.68'  '1719.68' NA
 4 '3,135.79'  '3135.79' NA
 5     111.91 111.91 111.91
 6     305.12     305.12     305.12", header=TRUE, colClasses="chr.w.commas")
 str(dat)
'data.frame':   6 obs. of  3 variables:
 $ Orig      : num  95.3 992.8 1719.7 3135.8 111.9 ...
 $ after_gsub: num  95.3 992.8 1719.7 3135.8 111.9 ...
 $ num       : num  95.3 992.8 NA NA 111.9 ...

How to load df with 1000 separator in R as numeric class?

3 Answers