Rewrite of the original post. I'm looking to eliminate a plyr dependency.
I tried to splice the tapply into my code as well as the lapply. The tapply worked for one variable (sex) but not 2 (sex, adult). Slipping the lapply response in does not return a word list by grouping variable it just returns one big word list with the grouping variable at the top (so for person it returns one word list instead of one word list for each person).
I apologize for the length of this but without including the real function I'm working on it doesn't seem to give you guys the insight to help me.
I'm going to include my attempts to alter the function with your suggestions in an answer instead of here to reduce an already bloated post. Also please don't comment on the extra user defined funtions unless helpful to the main problem. They're works in progress and included just to show you what the problem is.
CORRECT OUTPUT WITH PLYR: http://pastebin.com/mr9FvjpF
Dataframe
DATA<-structure(list(person = structure(c(4L, 1L, 5L, 4L, 1L, 3L, 1L,
4L, 3L, 2L, 1L), .Label = c("greg", "researcher", "sally", "sam",
"teacher"), class = "factor"), sex = structure(c(2L, 2L, 2L,
2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor"),
adult = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L), state = structure(c(2L,
7L, 9L, 11L, 5L, 4L, 8L, 3L, 10L, 1L, 6L), .Label = c("Shall we move on? Good then.",
"Computer is fun. Not too fun.", "I distrust you.",
"How can we be certain?", "I am telling the truth!", "Im hungry. Lets eat. You already?",
"No its not, its ****.", "There is no way.", "What should we do?",
"What are you talking about?", "You liar, it stinks!"
), class = "factor"), code = structure(c(1L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 2L, 3L), .Label = c("K1", "K10", "K11",
"K2", "K3", "K4", "K5", "K6", "K7", "K8", "K9"), class = "factor")), .Names = c("person",
"sex", "adult", "state", "code"), row.names = c(NA, -11L), class = "data.frame")
#=====================
DEPENDENT USER DEFINED TOOLS
Trim<-function (x) gsub("^\\s+|\\s+$", "", x)
bracketX<-function(text, bracket='all'){
switch(bracket,
square=sapply(text, function(x)gsub("\\[.+?\\]", "", x)),
round=sapply(text, function(x)gsub("\\(.+?\\)", "", x)),
curly=sapply(text, function(x)gsub("\\{.+?\\}", "", x)),
all={P1<-sapply(text, function(x)gsub("\\[.+?\\]", "", x))
P1<-sapply(P1, function(x)gsub("\\(.+?\\)", "", x))
sapply(P1, function(x)gsub("\\{.+?\\}", "", x))})
}
words <- function(x){as.vector(unlist(strsplit(x, " ")))}
word.split <- function(x) lapply(x, words)
strip <- function(x){
sentence <- gsub('[[:punct:]]', '', as.character(x))
sentence <- gsub('[[:cntrl:]]', '', sentence)
sentence <- gsub('\\d+', '', sentence)
Trim(tolower(sentence))
}
#=====================
FUNCTION OF INTEREST
textLISTER <- function(dataframe = DFwcweb, text.var = "dialogue", group.vars = "person") {
require(plyr)
DF <- dataframe
DF$words <- Trim(as.character(bracketX(dataframe[, text.var])))
DF$words <- as.vector(word.split(strip(DF$words)))
#I'd like to get ride of the plyr dependency in the line below
dlply(DF, c(group.vars), summarise, words = as.vector(unlist(DF$words)))
}
#=====================
CURRENTLY THE CODE WORKS WITH ONE OR MORE GROUPING VARIABLES.
textLISTER(DATA, 'state', 'person')
textLISTER(DATA, 'state', c('sex','adult'))
.parallelflags), and reusability. - Iterator