0
votes

my dcast Rcodes are not running anymore. I have the problem discussed here: segfault in R using reshape2 package and dcast

The bug has not yet been fixed so I am looking for other ways of achieving my dcast output. Any suggestions would be greatly appreciated!

Below a very small dput of my dataset. Basically, there's one entry per species per survey ID ("EID"). I would like to get one entry per survey ID ("EID") with all my species as columns with their associated value ("value") i.e., wide format.

> dput(sample)
structure(list(EID = c("L00155/69/2000-09-06", "Q99107/178/1999-08-23", 
"G02192/1/2002-07-08", "G97158/1/1997-10-26", "Q06091/2/2006-07-04", 
"L00004/171/2000-03-01", "G11094/15/2011-09-05", "Q04127/16/2004-07-28", 
"Q02122/230/2002-10-29", "G08002/6/2008-02-03", "Q99006/143/1999-02-17", 
"Q08053/3/2008-06-12", "Q99128/22/1999-08-19", "L00177/83/2000-12-18", 
"Q05122/11/2005-08-30", "Q04156/44/2004-10-29", "L01097/69/2001-06-26", 
"G08004/169/2008-05-14", "Q03041/26/2003-06-14", "G98115/60/1998-09-11", 
"G00002/20/2000-01-17", "G00002/20/2000-01-17", "G00054/1/2000-05-31", 
"G00054/1/2000-05-31"), tspp.name = structure(c(13L, 13L, 13L, 
13L, 16L, 13L, 13L, 4L, 13L, 13L, 13L, 13L, 13L, 11L, 4L, 13L, 
13L, 13L, 13L, 20L, 13L, 13L, 24L, 24L), .Label = c("American plaice", 
"American sand lance", "Arctic cod", "Atlantic cod", "Atlantic halibut", 
"Atlantic herring", "Bigeye tuna", "Black dogfish", "Bluefin tuna", 
"Capelin", "Greenland halibut", "Lookdown", "Northern shrimp", 
"Ocean quahog", "Porbeagle", "Redfishes", "Slenteye headlightfish", 
"Smooth flounder", "Spiny dogfish", "Striped pink shrimp", "Summer flounder", 
"White hake", "Winter flounder", "Witch flounder", "Yellowtail flounder"
), class = "factor"), elasmo.name = structure(c(26L, 30L, 30L, 
30L, 30L, 25L, 21L, 30L, 30L, 30L, 30L, 21L, 30L, 5L, 30L, 30L, 
30L, 21L, 30L, 30L, 14L, 21L, 24L, 21L), .Label = c("Arctic skate", 
"Atlantic sharpnose shark", "Barndoor skate", "Basking shark", 
"Black dogfish", "Blue shark", "Deepsea cat shark", "Greenland shark", 
"Jensen's skate", "Little skate", "Manta", "Ocean quahog", "Oceanic whitetip shark", 
"Porbeagle", "Portuguese shark", "Rough sagre", "Roughtail stingray", 
"Round skate", "Sharks", "Shortfin mako", "Skates", "Smooth skate", 
"Soft skate", "Spiny dogfish", "Spinytail skate", "Thorny skate", 
"White shark", "White skate", "Winter skate", "NA"), class = "factor"), 
    elasmo.discard = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 
    25, 0, 0, 0, 1, 0, 0, 1, 1, 15, 25)), .Names = c("EID", "tspp.name", 
"elasmo.name", "elasmo.discard"), class = "data.frame", row.names = c("18496", 
"488791", "87549", "236671", "139268", "15606", "11132", "115531", 
"93441", "159675", "403751", "42587", "485941", "19285", "130395", 
"119974", "73826", "7953", "99124", "351461", "71", "72", "184", 
"185"))

At the end, I wish to obtain this:

library(plyr)
test<-dcast(sample, ...~elasmo.name,value.var ="elasmo.discard",fun.aggregate=sum)
test

Note that the "dcast" code works here, but I do get a fatal error when I run it on my overall dataset which has 145349 rows.

Many thanks!!

2
This really isn't the right way to ask this question. Segfaults are bugs by definition and should be sent to the maintainer. This might serves that purpose in this case since the author is a regular SO reader, but in general is not as courteous (or efficient) as an email.IRTFM
Ok. Thanks @DWin, I was hoping that someone could provide me with a suggestion on how to reshape my dataframe without using dcast.GodinA
It is hard to help without having an example.djhurio
@djhurio, I added a reproducible example.GodinA

2 Answers

1
votes

This would be the pre-Hadley method; first aggregate to get the sums, then reshape.

foo <- aggregate(d[,4,drop=FALSE], by=d[,1:3], sum)
reshape(foo, v.names="elasmo.discard", idvar=c("EID", "tspp.name"), 
             timevar="elasmo.name", direction="wide")

If the first part is slow, it may help to have fewer columns in the "by" part; it looks like tspp.name is defined by EID, if so, don't aggregate by it but instead add it in after the fact.

If the second part is slow, perhaps try one of the methods here: https://stackoverflow.com/a/9617424/210673.

To get better help on speeding it up, provide an appropriate example (perhaps using sample or rep) that code can be tested on. Solution speed often depends on how many unique combinations of each variable there are.

0
votes

I am not able to reproduce the error. See the code attached. I have increased the row number of sample to 196608.

Probably the number of categories in sample$elasmo.name plays a role.

library(reshape2)

sample <- structure(list(EID = c("L00155/69/2000-09-06", "Q99107/178/1999-08-23", 
  "G02192/1/2002-07-08", "G97158/1/1997-10-26", "Q06091/2/2006-07-04", 
  "L00004/171/2000-03-01", "G11094/15/2011-09-05", "Q04127/16/2004-07-28", 
  "Q02122/230/2002-10-29", "G08002/6/2008-02-03", "Q99006/143/1999-02-17", 
  "Q08053/3/2008-06-12", "Q99128/22/1999-08-19", "L00177/83/2000-12-18", 
  "Q05122/11/2005-08-30", "Q04156/44/2004-10-29", "L01097/69/2001-06-26", 
  "G08004/169/2008-05-14", "Q03041/26/2003-06-14", "G98115/60/1998-09-11", 
  "G00002/20/2000-01-17", "G00002/20/2000-01-17", "G00054/1/2000-05-31", 
  "G00054/1/2000-05-31"), tspp.name = structure(c(13L, 13L, 13L, 
  13L, 16L, 13L, 13L, 4L, 13L, 13L, 13L, 13L, 13L, 11L, 4L, 13L, 
  13L, 13L, 13L, 20L, 13L, 13L, 24L, 24L), .Label = c("American plaice", 
  "American sand lance", "Arctic cod", "Atlantic cod", "Atlantic halibut", 
  "Atlantic herring", "Bigeye tuna", "Black dogfish", "Bluefin tuna", 
  "Capelin", "Greenland halibut", "Lookdown", "Northern shrimp", 
  "Ocean quahog", "Porbeagle", "Redfishes", "Slenteye headlightfish", 
  "Smooth flounder", "Spiny dogfish", "Striped pink shrimp", "Summer flounder", 
  "White hake", "Winter flounder", "Witch flounder", "Yellowtail flounder"
  ), class = "factor"), elasmo.name = structure(c(26L, 30L, 30L, 
  30L, 30L, 25L, 21L, 30L, 30L, 30L, 30L, 21L, 30L, 5L, 30L, 30L, 
  30L, 21L, 30L, 30L, 14L, 21L, 24L, 21L), .Label = c("Arctic skate", 
  "Atlantic sharpnose shark", "Barndoor skate", "Basking shark", 
  "Black dogfish", "Blue shark", "Deepsea cat shark", "Greenland shark", 
  "Jensen's skate", "Little skate", "Manta", "Ocean quahog", "Oceanic whitetip shark", 
  "Porbeagle", "Portuguese shark", "Rough sagre", "Roughtail stingray", 
  "Round skate", "Sharks", "Shortfin mako", "Skates", "Smooth skate", 
  "Soft skate", "Spiny dogfish", "Spinytail skate", "Thorny skate", 
  "White shark", "White skate", "Winter skate", "NA"), class = "factor"), 
      elasmo.discard = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 
      25, 0, 0, 0, 1, 0, 0, 1, 1, 15, 25)), .Names = c("EID", "tspp.name", 
  "elasmo.name", "elasmo.discard"), class = "data.frame", row.names = c("18496", 
  "488791", "87549", "236671", "139268", "15606", "11132", "115531", 
  "93441", "159675", "403751", "42587", "485941", "19285", "130395", 
  "119974", "73826", "7953", "99124", "351461", "71", "72", "184", 
  "185"))

n <- nrow(sample)
N <- 145349
p <- ceiling(log2(N / n))
n * 2^p
n * 2^p > N

# Bad way of increasing the row number
for (i in 1:p) sample <- rbind(sample, sample)

nrow(sample)

class(sample)
head(sample)

table(sample$elasmo.name)
table(as.character(sample$elasmo.name))

test <- dcast(sample, ... ~ elasmo.name,
              value.var = "elasmo.discard",
              fun.aggregate = sum)
head(test)