1
votes

I'm a relative newbie to R and trying to reshape my data into long format from wide format and having problems. I'm thinking that my problem may be due to having made the data.frame from a data.frame that I have created in R, getting mean values of the large data.frame into another data.frame.

What I have done is this created an empty data.frame (ndf):

ndf <- data.frame(matrix(ncol = 0, nrow = 3))

Then used lapply to get the means from the large data.frame (ldf) into separate columns in the new data.frame, with the year being used from the large data.frame:

ndf$Year <- names(ldf)
ndf$col1 <- lapply(ldf, function(i) {mean(i$col1)})
ndf$col2 <- lapply(ldf, function(i) {mean(i$col2)})
etc.

The melted function in reshape2 does not work apparently because there are non-atomic 'measure' columns.

For using the reshape base function I have used the code:

reshape.ndf <- reshape(ndf, 
                    varying = list(names(ndf)[2:7]), 
                    v.names = "cover",
                    timevar = "species",
                    times = names(ndf[2:7]),
                    new.row.names = 1:1000,
                    direction = "long")

My output is then essentially just using the first row for the variables. So my wide data.frame looks like this (sorry for the strange names):

Year Cladonia.portentosa Erica.tetralix Eriophorum.vaginatum  
1 2014               11.75             35                   55     
2 2015               15.75          25.75                   70      
3 2016               22.75              5                 37.5

And the long data.frame looks like this:

Year             species cover id
1 2014 Cladonia.portentosa 11.75  1
2 2015 Cladonia.portentosa 11.75  2
3 2016 Cladonia.portentosa 11.75  3
4 2014      Erica.tetralix 35.00  1
5 2015      Erica.tetralix 35.00  2
6 2016      Erica.tetralix 35.00  3

Where the "cover" column should have the value from each year put into the cell with the corresponding year.

Please could someone tell me where I've gone wrong!?

3
How can this be used names(ndf[2:7]) when there are only 4 columns in your wide data? - IRTFM
have you tried tidyr::gather()? if not, check it out. it is basically the successor to reshape2. - roman
42 - I have only shown a portion of the data set, I was trying to reduce confusion but forgot to change the code to represent what I have shown. - dunnns
@roman - I looked into 'gather()_' but maybe not thoroughly enough. I will try again and report back - dunnns

3 Answers

0
votes

In addition to roman's answer, I thought I would share exactly what I did with my data set.

My initial "wide" data.frame ndf looked like this:

Year Cladonia.portentosa Erica.tetralix Eriophorum.vaginatum  
1 2014               11.75             35                   55     
2 2015               15.75          25.75                   70      
3 2016               22.75              5                 37.5

I used downloaded tidyr

install.packages("tidyr")

Then selected the package

library(tidyr)

I then used the gather() function in the tidyr package to gather the species columns Cladonia.portentosa Erica.tetralix and Eriophorum.vaginatum together into one column, with a cover column in the new "long" data.frame.

long.ndf <- ndf %>% gather(species, cover, Cladonia.portentosa:Eriophorum.vaginatum)

Easy peasy! Thanks again to roman for the suggestion!

0
votes

Here is an example of 'melting' in tidyr.

You'll need tidyr but I also like dplyr and am including it here to encourage its use along with the rest of the tidyverse. You'll find endless great tutorials on the web...

library(dplyr)
library(tidyr)

Let's use iris as an example, I want a long form where species, variable and value are the columns.

data(iris)

Here it is with gather(). we specify that variable and value are the column names for the new 'melted' columns. we also specify that we do not want to melt the column Species which we want to remain its own column.

iris_long <- iris %>%
  gather(variable, value, -Species)

inspect the iris_long object to make sure it worked.

0
votes

I'm answering your question in case it may help someone using reshape function.

Please could someone tell me where I've gone wrong!?

You have not specified parameter idvar and reshape has created one for you named id. In order to avoid it, just add to your code the line idvar = "Year" :

ndf <- read.table(text = 
  "Year Cladonia.portentosa Erica.tetralix Eriophorum.vaginatum
    1 2014               11.75             35                   55     
    2 2015               15.75          25.75                   70      
    3 2016               22.75              5                 37.5", 
  header=TRUE, stringsAsFactors = F)

reshape.ndf <- reshape(ndf, 
  varying = list(names(ndf)[2:4]), 
  v.names = "cover",
  idvar = "Year",
  timevar = "species",
  times = names(ndf[2:4]),
  new.row.names = 1:9,
  direction = "long")

The result looks as you were expecting

reshape.ndf
  Year              species cover
1 2014  Cladonia.portentosa 11.75
2 2015  Cladonia.portentosa 15.75
3 2016  Cladonia.portentosa 22.75
4 2014       Erica.tetralix 35.00
5 2015       Erica.tetralix 25.75
6 2016       Erica.tetralix  5.00
7 2014 Eriophorum.vaginatum 55.00
8 2015 Eriophorum.vaginatum 70.00
9 2016 Eriophorum.vaginatum 37.50