1
votes

I need explanation for data frame, sum, and mean commands If the code below is operated, I can get list of data files Preplist. Yet, the commands I need explanations for does not work due to some errors.

My guess is that the data I load does not have data frame, but just list of numbers for each data file. To sum or take mean for each data file, I need data frame. (When I check number of rows there is nothing, but NULL).

I think I need to make data frame for loaded data (200x200 matrix), and also first row and column has to ignored in original data.

This is my code:

Prepfiles <- list.files(pattern=".csv")
Preplist <- lapply(Prepfiles, read.table, sep = '\t', 
                  na.string = '', header = TRUE, skip=1)
bigPreplist <- do.call(rbind, Preplist)

I need to do load data at once, and plot data (sum every three data files together, and plot). However, I have trouble to run command sum() and mean() with my Preplist[1:24]

I am using are all numeric in 200 by 200 matrix. There are 24 data files. Also I want to keep 200 by 200 matrix form when I load data, but with my coding, there is are no number of rows if I check with nrow(Preplist[1]) Is it possible to keep same dataframe when you load data in? Or do I have to make new data frame?

Here are errors that I get with sum and mean commands:

> nrow(Preplist)
NULL
> sum(Preplist[1])
Error in sum(Preplist[1]) : invalid 'type' (list) of argument
> mean(Preplist[1])
[1] NA
Warning message:
In mean.default(Preplist[1]) :
  argument is not numeric or logical: returning NA
1
Have you tried subsetting with [[ instead of [?sebastian-c
What do you expect to be contained in Preplist[1]? A column of data? A 200x200 data frame?joran
If I can data as matrix 200 by 200, it is easier, but it can be list. I need to load numbers from data files, and take sum, mean, and plot.user87205

1 Answers

2
votes

It is useful here to look at the help for [ and [[ (which are the same page)

To quote the relevant section (as prepList is a list)

Recursive (list-like) objects

Indexing by [ is similar to atomic vectors and selects a list of the specified element(s).

Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument.

So,

prepList[1]

selects a list of length 1. The first element of this list would be your data.frame

preplist[[1]] 

will select the first element of prepList which is the data.frame you want.

That being said, I'm not sure you really want to be taking the mean or sum of a data.frame, you would want to make sure these returned what you wanted.

eg

prepList <- list(data.frame(a=1:5,b=2:6), data.frame(a=2:6,b=1:5))

# this will give a warning
mean(prepList[[1]])

## a b 
## 3 4 
## Warning message:
## mean(<data.frame>) is deprecated.
## Use colMeans() or sapply(*, mean) instead. 


## this will give a single number

sum(prepList[[1]])

## 35

If you want the column means for each data.frame that is in prepList or sums of each column use colMeans, colSums, or a nested version of lapply with means,

eg 
library(data.table)
rbindlist(lapply(prepList, function(x) lapply(x, mean)))

##    a b
## 1: 3 4
## 2: 4 3

or using plyr and ldply

library(plyr)
 ldply(prepList, function(x) {sapply(x, mean)})

or to limit yourself to numeric columns

using plyr

ldply(prepList,  numcolwise(mean))

using Filter

rbindlist(lapply(prepList, function(x) lapply(Filter(is.numeric,x), mean)))