2
votes

I want to combine data from several csv files with the same format so I can analyse them, but I cannot remove the headers/column names from the several combined files.

I have used the lapply function in order to take a list of the context of all these files and it looks something like:

ID X1 X2 ---> header of 1st csv file
1  5  6
2  6  9
.......
10 7  8

.

ID X1 X2 --> headers 2nd csv file
1  5  6
2  6  9
.......
10 7  8
e.t.c

How can I remove the header characters in order to apply mathematical operations to these data?

My code:

data<-lapply(files, read.csv)
mean <-(mean(data$column2, na.rm=TRUE))

I also tried read.csv(headers=FALSE) but R do not accept this when the function is inside the lapply

I expect the mean of the data frame of the combined files but I get the error:

In mean.default(data$column2, na.rm = TRUE) : argument is not numeric or logical: returning NA

2
In your example datais a list of dataframes. Eventually you want something like sapply(data, function(d) mean(d$X2)) or sapply(data, function(d) mean(d[[3]])) - jogo

2 Answers

0
votes

You can import your data directly without column names:

read_matrix <- function (csvfile) {
    a <- read.csv(csvfile, header=FALSE)
    matrix(as.matrix(a), ncol=ncol(a), dimnames=NULL)
}
df <-  read_matrix('even_iops_Jan15.csv')

An other option is setting them to null:

names(df) <- NULL
0
votes

If you have read the data correctly with headers I think what you'll need to do is first extract the columns and then take the mean.

You can extract the column

1) By name

mean(sapply(data, `[[`, 'column2'), na.rm = TRUE)

2) By position

mean(sapply(data, `[[`, 2), na.rm = TRUE)

With lapply you need to unlist the data first

mean(unlist(lapply(data, `[[`, 'column2')), na.rm = TRUE)