how read text file in R, that has different format every few rows

Question

can you guys suggest how i can read a text file with these contents/type into a table? eventually i will push table into plot. I tried multiple combinations of averages<-read.able("name.txt",header=T,sep="") and get this message:

Error in read.table("name.txt", header = T, sep = "") : more columns than column names

I need to be able to keep [[1]], [[2]], [[3]], [[4]] results and distinguish them separately in the table or data frame for plotting later. any help is greatly appreciated.

> dput(ll)
c("[[1]]", "Time Queue_Size", "1  0.00000000          0", "2  0.01463509          1", 
"3  0.18473331          0", "4  0.70555473          1", "5  1.10322362          0", 
"6  2.24972346          1", "7  2.32344665          0", "8  3.30621739          1", 
"9  3.37728921          2", "10 3.47074298          1", "11 3.75560929          0", 
"12 4.56816309          1", "", "[[2]]", "        Time Queue_Size", 
"1  0.0000000          0", "2  0.1080389          1", "3  0.5729134          0", 
"4  1.0917759          1", "5  1.1280721          0", "6  1.3647759          1", 
"7  1.9137004          0", "8  3.1164888          1", "9  3.1500754          0", 
"10 3.2951701          1", "11 3.9362245          0", "12 4.7629641          1", 
"", "[[3]]", "        Time Queue_Size", "1  0.0000000          0", 
"2  0.2151396          1", "3  0.5810463          0", "4  1.2669130          1", 
"5  1.2694239          0", "6  1.2890854          1", "7  1.7050347          0", 
"8  2.3904563          1", "9  2.6800687          2", "10 2.7654936          3", 
"11 2.9624973          4", "12 2.9652142          3", "13 3.0096070          4", 
"14 3.1811061          3", "15 3.5783809          2", "16 3.6793138          1", 
"17 3.9339087          0", "18 4.5799301          1", "", "[[4]]", 
"        Time Queue_Size", "1  0.0000000          0", "2  0.1200693          1", 
"3  0.3663455          2", "4  0.5931517          1", "5  0.8235883          2", 
"6  0.8590099          1", "7  0.9474114          0", "8  1.1327633          1", 
"9  1.2933192          0", "10 1.8779916          1", "11 2.2328193          0", 
"12 2.7430489          1", "13 2.8380578          2", "14 2.8465716          3", 
"15 3.0760839          4", "16 3.4489915          5", "17 3.8352777          4", 
"18 4.2612698          5")

ll looks like

[[1]]
Time Queue_Size
1  0.00000000          0
2  0.01463509          1
3  0.18473331          0
4  0.70555473          1
5  1.10322362          0
6  2.24972346          1
7  2.32344665          0
8  3.30621739          1
9  3.37728921          2
10 3.47074298          1
11 3.75560929          0
12 4.56816309          1

[[2]]
        Time Queue_Size
1  0.0000000          0
2  0.1080389          1
3  0.5729134          0
4  1.0917759          1
5  1.1280721          0
6  1.3647759          1
7  1.9137004          0
8  3.1164888          1
9  3.1500754          0
10 3.2951701          1
11 3.9362245          0
12 4.7629641          1

[[3]]
        Time Queue_Size
1  0.0000000          0
2  0.2151396          1
3  0.5810463          0
4  1.2669130          1
5  1.2694239          0
6  1.2890854          1
7  1.7050347          0
8  2.3904563          1
9  2.6800687          2
10 2.7654936          3
11 2.9624973          4
12 2.9652142          3
13 3.0096070          4
14 3.1811061          3
15 3.5783809          2
16 3.6793138          1
17 3.9339087          0
18 4.5799301          1

[[4]]
        Time Queue_Size
1  0.0000000          0
2  0.1200693          1
3  0.3663455          2
4  0.5931517          1
5  0.8235883          2
6  0.8590099          1
7  0.9474114          0
8  1.1327633          1
9  1.2933192          0
10 1.8779916          1
11 2.2328193          0
12 2.7430489          1
13 2.8380578          2
14 2.8465716          3
15 3.0760839          4
16 3.4489915          5
17 3.8352777          4
18 4.2612698          5

Is that literally what your input file looks like with the [[#]] values and the repeated headers in there? — MrFlick

agstudy agstudy · Accepted Answer · 2014-05-30T22:42:39

You can use readLines to read your file. Then you can split the lines using a regular expressions. Finally, you read each part using read.table(text=…)

ll <- readLines('filename.txt')
lapply(split(ll,cumsum(grepl('[[',ll,fixed=TRUE))),
         function(x)read.table(text=x[-1],header=T))

EDIT explanation of the idea

The use of cumsum is a classic way to convert a logical vector to integer vector (used here to split the list). Indeed, the grepl creates a logical vector that is converted to a split vector used by split function. An example:

cumsum(c(T,F,F,T,F))
[1] 1 1 1 2 2

Then You can use the result to split a vector :

split(1:5,cumsum(c(T,F,F,T,F)))

$`1`
[1] 1 2 3

$`2`
[1] 4 5

how read text file in R, that has different format every few rows

1 Answers

EDIT explanation of the idea