I try to reshape a dataset from this (mydata - snippet)
sample species cell_nr biovol
1 41442bay_1 Mytilus sp. 6.22 1243.04
2 41502elba_1 Mytilus sp. 1.35 260.64
3 41502bay_3 Mytilus sp. 2.74 548.21
4 41443bay_2 M. edulis 599.14 114028.15
5 41411elba_2 M. edulis 5107.51 1021502.16
to this (result)
sample variable Mytilus sp. M. edulis
1 41442bay_1 cell_nr 6.22 0
2 41442bay_1 biovol 1243.04 0
3 41443bay_2 cell_nr 0 599.14
4 41443bay_2 biovol 0 114028.15
So far I used reshape2 in R
mymelt <- melt(mydata, id=c("species", "sample"))
result <- dcast(mymelt, sample+variable~species)
But it aggregates my variables
Aggregation function missing: defaulting to length
I need unique IDs for my pair of variables to reshape without aggregation - as fas as I understand by reading these two threads: how-to-use-cast-in-reshape-without-aggregation & reshaping-data-frame-with-duplicates
However, I'm stuck at this point. Any help is welcome and thank you in advance.
//edited
//edit2
this is a subset of "mydata" - the starting table - two sampling sites and two variables & and the taxa
structure(list(sample = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("41411bay_1",
"41411elba_1"), class = "factor"), genspec = structure(c(1L,
2L, 5L, 6L, 8L, 9L, 10L, 11L, 13L, 18L, 14L, 15L, 16L, 17L, 19L,
20L, 21L, 22L, 23L, 12L, 24L, 25L, 26L, 27L, 3L, 4L, 5L, 6L,
7L, 7L, 8L, 9L, 10L, 11L, 14L), .Label = c("Achnanthes_taeniata",
"Asterionella_formosa", "Chaetoceros_ceratosporus", "Chaetoceros_compressus",
"Chaetoceros_simplex", "Chaetoceros_socialis", "Chaetoceros_sp.",
"Chaetoceros_wighamii", "Chroococcus_minimus", "Chrysophyte_",
"Cryptophyte_", "decaying_dino", "decaying_dino_", "Gymnodinium_sp.",
"Melosira_nummuloides", "Monoraphidium_contortum", "Mougeotia_sp.",
"Mytilus_sp.", "Navicula_sp.", "Protoperidinium_pellucidum",
"Protoperidinium_sp.", "Quadrigula_sp.", "Rhabdoderma_lineare",
"Skeletonema_costatum", "Surirella_sp.", "Thalassionema_nitzschioides",
"Thalassiosira_sp."), class = "factor"), total_cell_nr = c(570.14,
142.54, 30.54, 95.02, 213.8, 6246.1, 1924.23, 71.27, 47.51, 23.76,
71.27, 23.76, 35.63, 11.88, 35.63, 59.39, 47.51, 35.63, 95.02,
59.39, 6235.91, 11.88, 35.63, 11.88, 487.34, 314.42, 15.72, 110.05,
408.74, 31.44, 267.25, 35471.82, 13119.72, 534.51, 15.72), total_biovol = c(114028.15,
74830.97, 25900.68, 23850.89, 500084.7, 51217.98, 769690, 15465.07,
342702.1, 11877.93, 5485537.87, 102340.26, 1460.99, 64200.22,
74830.97, 1640342.42, 656754.62, 7483.1, 2375.59, 428377.62,
860556.18, 950234.57, 37059.15, 35633.8, 207121.44, 107530.22,
13331.23, 27621.43, 163904.98, 12608.08, 625105.87, 290868.96,
5247886.36, 115988.01, 1210045.12)), .Names = c("sample", "genspec",
"total_cell_nr", "total_biovol"), class = "data.frame", row.names = c(NA,
-35L))
and I'm doing this
mymelt <- melt(mydata, id.vars=c("genspec", "sample"))
mymelt$indx <- with(mymelt, ave(seq_along(genspec), genspec, sample, FUN=seq_along))
result <- dcast(mymelt, sample+variable+indx~genspec, value.var='value', fill=0)
I would expect, that I get as a result 4 obs. (two sites and two variables), but instead I get 7 obs. with duplicated samples for bay_1 but not elba_1 - and this happens throughout the whole results in the original dataset. I guess it's a very basic problem with a simple answer, but I can't see it.
//edit3
Alright, I see what happened here - there were duplicated genspec (i.e. species) in my samples. This brought the chaos into the overall working answer from akrun. And to see what I mean, use the following commands with the above pasted df - I removed the duplicated sample and everything works fine:
mydata <- mydata[-30,]
mymelt <- melt(mydata, id.vars=c("genspec", "sample"))
mymelt$indx <- with(mymelt, ave(seq_along(genspec), genspec, sample, FUN=seq_along))
result <- dcast(mymelt, sample+variable+indx~genspec, value.var='value', fill=0)