I have a dataframe with over 20 millions records. I created chunks so I can load small sets into multiple csv files. The code is working and it's creating n number of csv files of equal size.
This is the code I am using:
n = 14 # defining the number of chunks
df = split(df_t3, factor(sort(rank(row.names(df))%%n))) # breaking into 14 list
lapply(names(df), function(x){
write.csv(df[[x]], paste(x, ".txt", sep = ""), row.names = FALSE) #creating csv files
})
I want to modify this as such each chunks capture an entire set records related to a same ID before breaking off the file.
For example ID10 = 300 rows, ID 20 = 500 rows. The files should capture entire 300 records before jumping into next chunk. I have more then 1mm ID's thus cannot use ID as an criteria to break into chunks.
Not sure if I am really clear in my request. Happy to provide more clarity. Thanks