2
votes

The aim here is to read a csv table, and the file has a direct URL. I want to use fread (data.table package) because it is faster with read.csv, but I have a little problem.

options(scipen=999)

caracteristiques=read.csv(url("https://www.data.gouv.fr/s/resources/base-de-donnees-accidents-corporels-de-la-circulation/20160909-181230/caracteristiques_2015.csv"))
caracteristiques[1,1]
# 201500000001

I have to problem to get the [1,1] element.

Now I use fread:

library(data.table)   

caracteristiques=data.table(fread("https://www.data.gouv.fr/s/resources/base-de-donnees-accidents-corporels-de-la-circulation/20160909-181230/caracteristiques_2015.csv",
                                      sep=","))
    caracteristiques[1,1]
    # 

Then we can see a with strange number. I have to specify options(scipen=0) to show it 9.955423e-313I am wondering if I have to specify some options in fread, since they are large numbers in the first column.

1

1 Answers

7
votes

fread automatically assumed the first column's class to be integer64. From its help file:

integer64 = "integer64" (default) reads columns detected as containing integers larger than 2^31 as type bit64::integer64. Alternatively, "double"|"numeric" reads as base::read.csv does; i.e., possibly with loss of precision and if so silently. Or, "character".

The values in the first column are: 201500000001, 201500000002, etc. If you treat them as numbers, they are larger than 2^31 (i.e. 2147483648). Thus fread interpreted them as integer64 values, & caused them to look really strange.

data.table will automatically load the bit64 package for you in this situation so that the numbers display properly. However, when you don't have bit64 installed, as you likely don't, it is supposed to warn you and ask you to install it. That lack of warning is bug fix 5 in the development version v1.10.5. From NEWS :

When fread() and print() see integer64 columns are present but package bit64 is not installed, the warning is now displayed as intended. Thanks to a question by Santosh on r-help and forwarded by Bill Dunlap.

So, just install.packages("bit64") and you're good. You don't need to reload the data. It just affects how those columns are printed.

Alternatively, if you add the argument integer64 = "numeric" to your fread function, the result will match what you got from read.csv. But if it's an ID column, conceptually it should be a character or factor, rather than integer. You can use the argument colClasses=c("Num_Acc"="character") for that.