3
votes

I have a dataframe, dfregion, which looks as follows:

dput(dfregion)
structure(list(region = structure(c(1L, 2L, 3L, 3L, 1L), .Label = c("East", 
"New England", "Southeast"), class = "factor"), words = structure(c(4L, 
 2L, 1L, 3L, 5L), .Label = c("buildings, tallahassee", "center, mass, visitors", 
"god, instruct, estimated", "seeks, metropolis, convey", "teaching, academic, metropolis"
), class = "factor")), .Names = c("region", "words"), row.names = c(NA, 
-5L), class = "data.frame")

      region                       words                                                                                                                                             
 1        East                    seeks, metropolis, convey 
 3 New England                    center, mass, visitors 
 4   Southeast                    buildings, tallahassee
 5   Southeast                    god, instruct, estimated
 6        East                    teaching, academic, metropolis

I am working on "melting" or "reshaping" this dataframe by region and then would like to paste the words together.

The following code is what I have tried:

dfregionnew<-dcast(dfregion, region ~ words,fun.aggregate= function(x) paste(x) )

dfregionnew<-dcast(dfregion, region ~ words, paste)

dfregionnew <- melt(dfregion,id=c("region"),variable_name="words")

Finally, I did this- however I am not sure this is the best way to accomplish what I want

dfregionnew<-ddply(dfregion, .(region), mutate, index= paste0('words', 1:length(region)))
dfregionnew<-dcast(dfregionnew, region~ index, value.var ='words')

The result is a dataframe reshapen in the right way, yet each "word" column is separate. Subsequently, I tried to paste these columns together and am getting various errors while doing so.

dfregionnew$new<-lapply(dfregionnew[,2:ncol(dfregionnew)], paste, sep=",")
dfregionnew$new<-ldply(apply(dfregionnew, 1, function(x) data.frame(x = paste(x[2:ncol(dfregionnew], sep=",", collapse=NULL))))
dfregionnew$new <- apply( dfregionnew[ , 2:ncol(dfregionnew) ] , 1 , paste , sep = "," )

I was able to solve that problem by doing something similar to below:

dfregionnew$new <- apply( dfregionnew[ , 2:5] , 1 , paste , collapse = "," )

I guess my real question is, would it be possible to do this in one step using melt or dcast, without having to paste together the various columns after they are output. I am very interested in improving my skills and would love faster/ better practices in R. Thanks in advance!

1
Thanks for updating with the dput of the input, but I'm still not clear on the exact output you want. You just want all the values in the "word" column pasted together grouped by "region"?A5C1D2H2I1M1N2O1R2T1
yes, I am interested in having the same two columns, "region" and "words", only I would like the words to be a concatenation of all the words from each "row" in that region. say for the region east, words would be "seeks, metropolis, convey, teaching, academic, metropolis"RCN

1 Answers

7
votes

It sounds like you just want to paste the values in the "word" column together, in which case, you should be able to just use aggregate as follows:

aggregate(words ~ region, dfregion, paste)
#        region                                                     words
# 1        East seeks, metropolis, convey, teaching, academic, metropolis
# 2 New England                                    center, mass, visitors
# 3   Southeast          buildings, tallahassee, god, instruct, estimated

No melting or dcasting required....


If you do want to use dcast from "reshape2", you can try something like this:

dcast(dfregion, region ~ "WORDS", value.var="words", 
      fun.aggregate=function(x) paste(x, collapse = ", "))
#        region                                                     WORDS
# 1        East seeks, metropolis, convey, teaching, academic, metropolis
# 2 New England                                    center, mass, visitors
# 3   Southeast          buildings, tallahassee, god, instruct, estimated