Convert character matrix from readLines into equivalent data.frame using read.table

Question

I have a problem using the solution to this question:

Why the field separator character must be only one byte?

I have a file with columns delimited with ~~~, and of course read.table fails with the error invalid 'sep' value: must be one byte. I googled and found the above question, which successfully reads the file into a character matrix.

However, I would like to now convert this character matrix into a data frame, assigning the type to each column automatically, with rules determined as if read.table had worked on the original file, e.g. dates and strings get converted to factors, etc.

You could use readLines() and then split each line using strsplit() on the ~~~ delimeter. But this would not necessarily format the data as you want it. — Tim Biegeleisen
this is exactly how the other solution works, but it creates a character matrix which I am now struggling to convert. — Alex
Just as.data.frame it, and then cast the columns as you want. — Tim Biegeleisen
I would like to cast the columns automatically, per the rules used in read.table — Alex
why not write out the matrix as a "txt" document with single byte separator, and then read in again with read.table? — Adam Quek

Alex Alex · Accepted Answer · 2016-05-10T06:04:55

read.table has a helper function utils::type.conversion, whose helpfile states:

This is principally a helper function for read.table. Given a character vector, it attempts to convert it to logical, integer, numeric or complex, and failing that converts it to factor unless as.is = TRUE. The first type that can accept all the non-missing values is chosen.

The bit in read.table that calls this function is:

  for (i in (1L:cols)[do]) {
    data[[i]] <- if (is.na(colClasses[i])) 
      type.convert(data[[i]], as.is = as.is[i], dec = dec, 
                   numerals = numerals, na.strings = character(0L))
  ...
  }

where the ellipsis deals with column types configured in the call to read.table.

For my purposes the following is sufficient:

df2 <- do.call(rbind,strsplit(readLines('test.txt'),'~~~',fixed=T))

df2_processed <-
  setNames(
    as.data.frame(lapply(1:ncol(df2), function(i) {
      type.convert(df2[,i])}), stringsAsFactors = FALSE), 
  paste0('v', 1:ncol(df2)))

where test.txt is the following text file:

2015-03-22~~~153.234~~~hello~~~5~~~6
2015-03-22~~~153.234~~~hello~~~5~~~6
2015-03-22~~~153.234~~~hello~~~5~~~6
2015-03-22~~~153.234~~~hello~~~5~~~6
2015-03-22~~~153.234~~~hello~~~5~~~6

Convert character matrix from readLines into equivalent data.frame using read.table

1 Answers