fread from data.table package can't read small numbers

Question

I am using fread() from data.table to efficiently read large rectangular CSV files into R which are all double (and only double) values -- no missing elements.

However if I have very very small numbers in scientific notation, it'll get converted to character which ruins the whole read. Here is the error message (as an example, there are multiple for each small number):

16: In fread("SomeCSVFile") :
Bumped column 560 to type character on data row 16799, field contains '-2.1412168512924677E-308'. Coercing previously read values in this column from integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Please note that column type detection uses the first 5 rows, the middle 5 rows and the last 5 rows, so hopefully this message should be very rare. If reporting to datatable-help, please rerun and include the output from verbose=TRUE.

I want the function to set them to zero or truncate them at the minimum possible value (either is fine).

This question can be hugely improved by providing some sample data that reproduces the problem. — Richie Cotton
What is the real problem? Do you have any real use for those small values? Who allowed the Excel sheet to generate them in the first place? If you're going to set them to zero anyway, just read them in and then matrix[is.character(matrix)]<-numeric(0) — Carl Witthoft
@CarlWitthoft I don't use Excel but thanks for the solution. — user2763361
Oops my bad for assuming all csv spawned from the vile pit of Excel. — Carl Witthoft
when I try putting numbers with very small exponent, they get read as 0 - you need to add a reproducible example + operating system/package details for you setup - as is this is not a good question — eddi

Richie Cotton Richie Cotton · Accepted Answer · 2014-02-13T11:07:43

To reproduce this, I put this content in a text file:

Then I called fread("that file.txt").

The size of the smallest positive number that R can store is

format(.Machine$double.xmin, digits = 22)
## [1] "2.2250738585072013828e-308"

Your data file includes the value -2.1412168512924677E-308, which is smaller than this limit. To prevent R treating the value as zero, the data.table package has converted the column to be strings. This stops the data precision being lost.

If you need to work with values of this size, then use the Rmpfr package to store the numbers with more precision. Import them as characters (using colClasses; see that data table warning text). Then use

library(Rmpfr)
mpfr("-2.1412168512924677E-308")
## 1 'mpfr' number of precision  70   bits 
## [1] -2.1412168512924676999992e-308

As Ben Bolker siad in the comments. If you don't care about the tiny numbers, and just want to treat them as zero, then import the column as characters, then use as.numeric.

the_data <- fread("the file.txt", colClasses = "character")
the_data$DodgyColumn <- as.numeric(the_data$DodgyColumn)

fread from data.table package can't read small numbers

1 Answers