0
votes

I have a file (prf003.tre) generated from some old proprietary software, that I am trying to edit in R. It is structured as such:

0001  116.00 1BF 19.2     0.0             5500        0           
0001  216.00 1BF 19.2     0.0             5500        0           
0001  316.00 1BF 19.2     0.0             5500        0           
0001  416.00 1BF 19.2     0.0             5500        0           
0001  516.00 1BF 19.2     0.0             5500        0           
0001  616.00 1BF 19.2     0.0             5500        0           
0001  716.00 1BF 19.2     0.0             5500        0           

The goal is to be able to import the file, modify the values in column 2 to read

prf003[, 2]<- seq.int(nrow(prf003))

and then re-export the file.

(Between each cell are about 10-20 spaces depending on what column. Unfortunately copying this into stackoverflow does not make it appear this way, so I pasted it as code, hope that is okay, sorry I am newb. I need to preserve the integrity of the spacing.)

I tried to import into R, trying both read.table and readLines. read.table does not preserve the spacing, however I am unable to modify column 2 using readLines, given that it reads it as one column. Any suggestions? Perhaps there is a setting in read.table that I am not aware of, but searching has brought up nothing.

edit: read.Table also drops the 0's at my first column, any tips on how to preserve the "0001" would be helpful.

2
What does the spacing look like in your source file? Are you trying to preserve an exact number of spaces between columns?Mako212
Also, using read.table(data, colClasses = "character") will prevent the leading zeros from being dropped.Mako212
@Mako212 The spacing is as shown in the data table I've pasted-> there are 2 spaces between col 1 and col 2, 1 space between 2 and 3, and 3 and 4, 5 spaces between col 4 and 5, 13 spaces between col 5 and 6, and 8 spaces between col 6 and 7. "Are you trying to preserve an exact number of spaces between columns?" YesPaulina

2 Answers

1
votes

Suppose we have the character vector L read in using readLines as shown in the Note at the end. Then assuming you want the replacement of column 2 to also have 2 digits after the decimal:

substr(L, 7, 12) <- sprintf("%6.2f", seq_along(L))
writeLines(L, stdout()) # replace stdout() with "myfile.dat", say

giving:

0001    1.00 1BF 19.2     0.0             5500        0
0001    2.00 1BF 19.2     0.0             5500        0
0001    3.00 1BF 19.2     0.0             5500        0
0001    4.00 1BF 19.2     0.0             5500        0
0001    5.00 1BF 19.2     0.0             5500        0
0001    6.00 1BF 19.2     0.0             5500        0
0001    7.00 1BF 19.2     0.0             5500        0

Note

Lines <- "0001  116.00 1BF 19.2     0.0             5500        0           
0001  216.00 1BF 19.2     0.0             5500        0           
0001  316.00 1BF 19.2     0.0             5500        0           
0001  416.00 1BF 19.2     0.0             5500        0           
0001  516.00 1BF 19.2     0.0             5500        0           
0001  616.00 1BF 19.2     0.0             5500        0           
0001  716.00 1BF 19.2     0.0             5500        0"
L <- trimws(readLines(textConnection(Lines)))
0
votes

In order to use R functions on a column, we need to convert to data frame first. This means we're going to need to reconstruct the source file spacing at the end.

First, we'll read with colClasses = 'character' to preserve leading zeros:

prf003 <- read.table(data, colClasses = "character")

prf003[, 2] <- seq.int(nrow(prf003))

Now, we'll define a vector for the column spacing, (note, we need an empty element at the end since there are no spaces after the last column):

spacing <-  c("  ", " ","   ","     ", "             ", "        ","")

And use mapply with paste0 to add those spaces to the end of each column (this applies paste0(prf003[ ,1], spacing[[1]]), paste0(prf003[ ,2], spacing[[2]]), etc.:

formatted_prf <- mapply(paste0, prf003, spacing) 

Then we can write back to your original file format using write.table

write.table(formatted_prf, "new_prf.tre", sep = "", quote = FALSE, 
  col.names = FALSE, row.names = FALSE)

Noting that both sep must be empty, and quote = FALSE for this not to screw up our spacing.

This is the output of write.table:

0001  1 1BF   19.2     0.0             5500        0
0001  2 1BF   19.2     0.0             5500        0
0001  3 1BF   19.2     0.0             5500        0
0001  4 1BF   19.2     0.0             5500        0
0001  5 1BF   19.2     0.0             5500        0
0001  6 1BF   19.2     0.0             5500        0
0001  7 1BF   19.2     0.0             5500        0