0
votes

I am experimenting with different packages to find the best suit to save data files such as csv ones fast.

I have found 'iotools' package and the method 'write.csv.raw' that is pretty good to save data concerning the time lapsed.

However the dataset in the file saved has some controversial features:

  • no column names;
  • double/float numbers are with decimal sign "." but not with "," .

So I need to have dataset in the file saved to be with column names and the correct decimal sign.
My script as follows:

library(iotools)
library(UsingR)

data(galton)
head(galton)
#option1 to save data
write.csv.raw(galton,"test.csv",append=FALSE,sep=";",col.names=TRUE)
#option2 to save data
write.table.raw(galton,"test.csv",append=FALSE,sep=";",col.names=TRUE)
read.csv2("test.csv",nrow=5)

the input dataset (from R):

child parent
61.7   70.5
61.7   68.5
61.7   65.5
61.7   64.5
61.7   64.0
62.2   67.5

the output file:

X1.61.7 X70.5
2\t61.7  68.5
3\t61.7  65.5
4\t61.7  64.5
5\t61.7    64
6\t62.2  67.5  

Update of 18/02/16:
with help of the answer by procrastinator0 I have managed to use 'write.csv.raw' in correct manner.

The comparison of different write-methods based upon the dataframe from the question section as follows:

system.time(write.csv.raw(n,"test.csv",sep=";",append=TRUE))
user system elapsed
15.61 1.17 21.92

system.time(write.table(n,"test.csv",sep=";",row.names=FALSE,dec=","))
user system elapsed
63.25 1.20 64.60

system.time(write.csv2(n,"test.csv",row.names=FALSE))
user system elapsed 63.71 1.28 65.38

system.time(write_csv(n, "test.csv", na = "NA")) user system elapsed
136.75 3.60 141.24

Update of 27/04/16: (out of date)
I have done some experiment runs to write/read data (different tools). Experiments are based on the theoretical sample as well as the real one (from my practice). I have tried to make reproducible scripts. Hope they will be useful for newcomers :-)

Links to IO experiments:

Reading data from files: https://rpubs.com/demydd/166375
Writing data to files: https://rpubs.com/demydd/170957

Update of 19/09/16:
feather package is added (read_feather, write_feather) fwrite is added from data.table package.

links to updated tests:

to read
to write

2
Question is unclear, saying no columns, yet writing out with col.names=TRUE. Data is not controversial, what is the question?zx8754
It would also be interesting to know (roughly) the dimension of your real data that you're trying to write.talat
@ zx8754: I mean the dataset in the file saved. If I open the file I see no column names and the decimal sign "." in place of ",". @ docendo discimus: the inital dataset was 386000 rows and 140 cols (numeric and non numeric). After application 'write.csv.raw' I have no colnames and the correct decimal sign. After that I started to test minute samples such as galton.Dimon D.
Is write.csv.raw faster than write_csv{readr} ? You question is about which is the fastest method to write a .csv file, right?rafa.pereira
@Rafael Pereira You are true. I am looking for the fastest way to write csv in correct manner. So far the fastest methods are fread() and write.csw.raw() according to my tests. But I have not tested write_csv{readr} yet. If you can offered something better - you are very welcome :-)Dimon D.

2 Answers

1
votes

For column names, this is a known issue. Suggested workaround:

> cat(noquote(paste0(paste0(names(df),collapse = ","),"\n")),file = "output.csv")
> write.csv.raw(df,"output.csv",append=TRUE)

write.csv.raw does not index with "\t" for me by default, but you could try using NA for the nsep argument.

0
votes

You can save the column names as factor and then use it as follows :

library(iotools)
library(UsingR)

data(galton)

Cnames=as.factor(colnames(galton))

write.table(galton,"test2.csv",sep=";")

test2=read.delim("test2.csv",sep = ";",)
colnames(test2)=Cnames

The output is :

head(test2)
  child parent
1  61.7   70.5
2  61.7   68.5
3  61.7   65.5
4  61.7   64.5
5  61.7   64.0
6  62.2   67.5