1
votes

In my dataset test, I would like generate a frequency table based on two columns - start and end. My objective is to only count a unique letter once per row even it appears in both columns. For example, in the first row "C" should be counted only as one. In row 4, "B" should be counted once and "A" should be counted once as they are not the same. I know that I should use the unique() function somehow in but am not sure how to combine that with table() to generate a frequency table that counts repeated letter in a row as one (NA values should be omitted). Any suggestions would be appreciated.

> test
   start  end
1      C    C
2      A <NA>
3   <NA> <NA>
4      B    A
5      A    A
6   <NA>    A
7   <NA>    B
8   <NA>    C
9      A <NA>
10     C    C

The output of the following table should be:

> output
  station Freq
1       A    5
2       B    2
3       C    3

And the test data:

> dput(test)
structure(list(start = c("C", "A", NA, "B", "A", NA, NA, NA, 
"A", "C"), end = c("C", NA, NA, "A", "A", "A", "B", "C", NA, 
"C")), .Names = c("start", "end"), row.names = c(NA, -10L), class = "data.frame")
1

1 Answers

4
votes

How about this?

output<- table(unlist(apply(test, 1, unique)))
output

A B C 
5 2 3 

apply is not a really efficient function to use, since it's a glorified for loop, but it will work fine in this case.