I am trying to read a large csv data set with 7 million rows using the following code
histca <- data.table::fread("HISTO_CA.csv", header = TRUE)
Some of the columns have odd characters....see example below.
I get the following error code
In fread("HISTO_CA.csv", select = c(1, 237:248), sep = ";", nrows = 1e+06, : Bumped column 239 to type character on data row 198668, field contains '™™?'.
Coercing previously read values in this column from logical, integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' t
How can I import the data and exclude the rows where this problem occurs

freaddocumentation and itscolClassesargument, we read: fread will only promote a column to a higher type if colClasses requests it. It won't downgrade a column to a lower type since NAs would result. You have to coerce such columns afterwards yourself, if you really require data loss. So I guess no direct solution withfread. I you want to stick todata.tableI would suggest trying to clean the file outside R (are you working on Unix?) - Eric LecoutrecolClasses), to clean the data inRand to convert the columns to numeric after cleansing. - Uwe