reading .txt file into R, unknown delimiter, no columns

Question

I have a large dataset, contained in a .txt file, that is broken into rows, without columns. Unfortunately, the rows are clustered by case. It looks a bit like this:

v1(case1): a   
v2(case1): b
v3(case1): c

v1(case2): d
v2(case2): e
v3(case2): f

…and so on. I tried using read.table to separate the variable names from the data, using this command:

data1 <- read.table("Data.txt", header = FALSE, sep = ":", fill=TRUE)

…but it wasn't completely effective (i.e., in some cases it placed the variable names in the "v1" column, and in some cases it did not), leading to this situation:

V1            V2
1   v1case1   a
2   v2case1   b 
3   v3case1   c
4   v1case2   d
5   v2case2   e
6   v3case2   f
7            v1case3
8            v2case3
9            v3case3

Any suggestions on a better way of either a) extracting all of the variable names into a separate column (so that I can use them to create new variables that will pull the relevant data for each variable into a column using "if/else") or b) a different way of putting this dataset into row/column format?

All advice much appreciated.

hrbrmstr hrbrmstr · Accepted Answer · 2014-04-27T02:52:50

stringr and plyr can help here if you start with readLines():

library(stringr)
library(plyr)

dat <- readLines("rows.txt")
print(dat)
## [1] "v1(case1): a" "v2(case1): b" "v3(case1): c" "v1(case2): d" "v2(case2): e" "v3(case2): f"

x <- ldply(str_match_all(dat, "^([[:alnum:]]+)\\(([[:alnum:]]+)\\):\ +([[:alnum:]]+)"))[,2:4]
print(x)
##    2     3 4
## 1 v1 case1 a
## 2 v2 case1 b
## 3 v3 case1 c
## 4 v1 case2 d
## 5 v2 case2 e
## 6 v3 case2 f

I'm not entirely sure how you need the resulting data frame to look like, but reshape or reshape2 can get you the rest of the way there.

reading .txt file into R, unknown delimiter, no columns

2 Answers