0
votes

I need to split one large .csv file, with about 9 columns and more than 9,000 rows, into a separate .txt file for each row, and name each newly generated file by the name in its first column.

e.g. for the .csv file:

01001_r1    32.4327 -86.6190    0.65    0.20    0.15    1.33    5.47    8
01001_r2    32.4327 -86.6190    0.65    0.20    0.15    1.33    5.46    8
01001_r3    32.4327 -86.6190    0.80    0.15    0.05    1.33    5.23    10
01003_r1    30.4887 -87.6918    0.65    0.20    0.15    1.33    5.23    9
01003_r2    30.4887 -87.6918    0.80    0.15    0.05    1.33    5.25    9
01003_r3    30.4887 -87.6918    0.65    0.20    0.15    1.33    4.96    8

I would end up with 6 files, with one row each.

Columns in the output file need to be 'tab separated', and the file must not contain neither row or column names.

e.g the output files should look like this:

01001_r1    32.4327 -86.6190    0.65    0.20    0.15    1.33    5.47    8

This is where I've gotten this far:

#set 'working directory'
setwd('C:/Users/Data/soils_data/sitesoil_in')

#identify data frame from .csv file
sd <- read.csv('site_soil.csv', sep="\t", header=F, fill=F)

lapply(1:nrow(sd), function(i) write.csv(sd[i,],
                                         file = paste0(sd[i,1], ".txt"),
                                         row.names = F, header = F,
                                         quote = F))

And this is what I get for each output file:

file name: 01001_r1

V1,V2,V3,V4,V5,V6,V7,V8,V9
01001_r1,32.4327,-86.619,0.65,0.2,0.15,1.33,5.47,8

I can't get it to eliminate the columns names or to separate the columns with tabs. I have tried with header = F, or col.names = F to eliminate headings, and sep = "\t" to separate the columns but it does not recognize the commands.

I would appreciate any help. Thanks, E.

Following all suggestions, this is the simpler code that would do the trick:

#set 'working directory'
setwd('C:/Users/Elena/Desktop/DayCent_muvp_MODEL/DayCent_SourceData/soils_data/sitesoil_in')

#identify data frame from .csv file
sd <- read.csv('site_soil.csv', sep="\t", header=F, fill=F)

lapply(1:nrow(sd), 
       function(i) write.table(sd[i,],
                               file = paste0(sd[i,1], ".txt",collapse = ""),
                               row.names = FALSE, col.names = FALSE,
                               sep = "\t"
       ))

Thank you all for your help. E.

3
check write.table - Your approach saves it again as .csv which is not desirable in your case. Could you explain the intention behind this merely useful split into thousands of .txt files littering your disk?Christian
Thank you Christian, I've tried that, but it gives me an error if I use write.table instead of write.csv ... These are input files for another code generating files for a model input. The later code requires these files to be in a very specific format to recognize the information.ebb
You will want to investigate the error being given then. write.csv is a wrapper around write.table, as described by the docs (?write.table), and is not intended to be as flexible. In my experience it is almost always better to use write.table and manually specify the correct arguments needed, rather than use write.csv and hope that its args are the ones you need.user5359531
Thank you user5359531,I'll check the arguments as you suggest, see if I can find the source of error.ebb

3 Answers

1
votes

I adjusted your code:

lapply(1:nrow(sd),
    function(i) write.table(sd[i,],
                            file = paste0(sd[i,1],".txt",collapse = ""),
                            row.names = FALSE,
                            sep = "\t"
                            ))
1
votes

Try this

dat <-"01001_r1,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.47,8
01001_r2,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.46,8
01001_r3,32.4327,-86.6190,0.80,0.15,0.05,1.33,5.23,10
01003_r1,30.4887,-87.6918,0.65,0.20,0.15,1.33,5.23,9
01003_r2,30.4887,-87.6918,0.80,0.15,0.05,1.33,5.25,9
01003_r3,30.4887,-87.6918,0.65,0.20,0.15,1.33,4.96,8
"


df <- read.delim(file = textConnection(dat), sep = ',', header = FALSE)

df
#         V1      V2       V3   V4   V5   V6   V7   V8 V9
# 1 01001_r1 32.4327 -86.6190 0.65 0.20 0.15 1.33 5.47  8
# 2 01001_r2 32.4327 -86.6190 0.65 0.20 0.15 1.33 5.46  8
# 3 01001_r3 32.4327 -86.6190 0.80 0.15 0.05 1.33 5.23 10
# 4 01003_r1 30.4887 -87.6918 0.65 0.20 0.15 1.33 5.23  9
# 5 01003_r2 30.4887 -87.6918 0.80 0.15 0.05 1.33 5.25  9
# 6 01003_r3 30.4887 -87.6918 0.65 0.20 0.15 1.33 4.96  8

output_file_base <- "soil_"
output_file_ext <- ".tsv"

for(i in seq(nrow(df))){
    output_file <- paste0(output_file_base, as.character(i), output_file_ext)
    dfi <- df[i, ]
    write.table(x = dfi, file = output_file, sep = '\t', quote = FALSE, col.names = FALSE, row.names = FALSE)
}

Output:

$ cat soil_6.tsv
01003_r3    30.4887 -87.6918    0.65    0.2 0.15    1.33    4.96    8
1
votes

This may work for what you are trying to accomplish.

df  <-read.csv(text = "01001_r1,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.47,8
01001_r2,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.46,8
01001_r3,32.4327,-86.6190,0.80,0.15,0.05,1.33,5.23,10
01003_r1,30.4887,-87.6918,0.65,0.20,0.15,1.33,5.23,9
01003_r2,30.4887,-87.6918,0.80,0.15,0.05,1.33,5.25,9
01003_r3,30.4887,-87.6918,0.65,0.20,0.15,1.33,4.96,8",
stringsAsFactors = FALSE,
header = FALSE)


apply(df, 1, function(x){write.table(t(x), 
                                     file = paste0(x[1],".txt"), 
                                     sep = "\t", 
                                     quote = FALSE, 
                                     col.names = FALSE, 
                                     row.names = FALSE)})