0
votes
NVAR    6957423
RATE    1
MAC 963.605
MAF 0.228126
SING    0
MONO    0
TITV    1.99326
TITV_S  NA
DP  NA
QUAL    NA
PASS    1
FILTER|PASS 1
PASS_S  0

I have several files (N=414) with the format above. In R, I would like to read all files, transpose, and rbind or concatenate values. My files are named age_wg2_ind1.vstats, ranging from 1 to 414 (after ind[i].vstats). So far, I've tried this:

txtfiles = list.files(pattern="*.vstats")

for (i in 1:length(txtfiles)){
     tmp = read.table(txtfiles[i],sep="\t")
  ttmp<-t(tmp[i])
 colnames(ttmp)<-ttmp[1,];ttmp2<-ttmp[2:nrow(ttmp),]
}

Error in ttmp[2:nrow(ttmp), ] : subscript out of bounds

1) Will the list files command really begin with individual #1 and end with #414? 2) Not sure where to put [i] to retain second row of each file.

Thanks!

1
Can you update your question with exactly what you are asking? Thanks. - Shawn
what is the structure of the files? just text files? how about readLines()? I don't understand what is going on. - Elad663
@Elad663 Column names are in column 1 and values in column 2 in several text files. After transposing, the values are in row 2. I would like to extract values in each file and append into one file. Retaining the order of the files is also important - need to be able to know which values correspond to which file. - user3403622

1 Answers

0
votes

If your files all have the variable names in the first column with the values in the second the following should work

l <- lapply(txtfiles , function(i) { 
                 r <- read.table(i ,sep="\t")
                 mat <- t(r[,2])
                 colnames(mat) <- r[,1]
                 mat
                 })

(out <- do.call(rbind , l))

If you have (some) different variable names in each file have a look at rbind.fill in plyr package

You can look at the order of the files in txtfiles - the numerical order of your files will not necessarily be preserved. You can pre-process the order of the txtfiles before the lapply loop or as your files are named similarly you could define the file lists as

txtfiles2 <- paste0("age_wg2_ind",1:414,".vstats")

EDIT

Guessing that your error is an input error from one of your files - you can try this. I have made up some data o show it working. If you run the previous code above (on the example data below)you get an error message 'Error in read.table(i, sep = "\t") : no lines available in input'. But by using tryCatch it woks.

#Some example data

df <- data.frame(letters[1:3] , 1:3)
write.table(df,"temp1.out",sep="\t" , row.names=F , col.names=F)
write.table(df,"temp2.out",sep="\t", row.names=F , col.names=F)
df[,1] <- df[,2] <- NULL
write.table(df,"temp3.out",sep="\t", row.names=F , col.names=F) # zero columns

#Read in data
txtfiles <- list.files(pattern="*.out")

l <- lapply(txtfiles , function(i) { 
                r <-  tryCatch(read.table(i , sep="\t"), error=function(e) NULL)
                 if(!is.null(r)) {
                     mat <- t(r[,2])
                     colnames(mat) <- r[,1]
                     mat
        }})

l <- l[!sapply(l, is.null)]

(out <- do.call(rbind , l))