I have two .csv files:
- Table A with 32075892 rows which takes 2023365kb
- Table B with 21383928 rows which only takes 1051836kb
Both tables have the same number of columns with about the same content (An Id, An integer, A short string (always the same size), A numeric, Another String). The only difference is that for table A the String values of the last columns are slightly longer: 26.83 chars on average compared to 9.
I read and wrote both .csv files with fread and fwrite from the data.table package in R.
Table A has 50% more rows than B, but takes twice the space in file size. What is the reason for the large difference in file size?