2
votes

I would like to take a few string vectors and have the frequency of the words found in the vectors as a data frame. The column names of the dataframe should be the unique words found in all of the strings combined. I have this part, it is the frequency of these words being added to the data frame that is getting me. This is a very scaled down version of what I am attempting. I have tried using table(), but I am not sure I am on the right direction.

a <- c('A', 'B', 'C', 'D', 'E')
b <- c('A', 'D', 'J', 'G', 'X')
c <- c('A', 'A', 'B', 'B', 'C', 'X')

Example Data.Frame Design

vector.name  A  B  C  D  E  J  G  X 
a            1  1  1  1  1  0  0  0
b            1  0  0  1  0  1  1  1
c            2  2  1  0  0  0  0  1
2
To count the occurrences of strings, try something like sum(grepl("A", c))Matt
I have used grepl, but this seems very tedious for every value in the vector. How could I expand this?branch.lizard
Seems like I could use something from the apply function tree to iterate across the vector and count the frequency of all words. But then, how would one add the result to the dataframe in the proper format?branch.lizard

2 Answers

3
votes

This should work

countUniqueEntries <- function(l) {
    lapply(l, function(x) {
        x <- factor(x, levels = unique(unlist(l)));
        table(x) })
}

do.call(rbind, countUniqueEntries(list(a, b, c)));
     A B C D E J G X
[1,] 1 1 1 1 1 0 0 0
[2,] 1 0 0 1 0 1 1 1
[3,] 2 2 1 0 0 0 0 1
1
votes

This is essentially one table operation once you have a long dataset:

table(stack(mget(c("a","b","c")))[2:1])

#   values
#ind A B C D E G J X
#  a 1 1 1 1 1 0 0 0
#  b 1 0 0 1 0 1 1 1
#  c 2 2 1 0 0 0 0 1