9
votes

I have a dataset with responses to a Likert item on a 9pt scale. I would like to create a frequency table (and barplot) of the data but some values on the scale never occur in my dataset, so table() removes that value from the frequency table. I would like it instead to present the value with a frequency of 0. That is, given the following dataset

# Assume a 5pt Likert scale for ease of example
data <- c(1, 1, 2, 1, 4, 4, 5)

I would like to get the following frequency table without having to manually insert a column named 3 with the value 0.

1 2 3 4 5 
3 1 0 2 1

I'm new to R, so maybe I've overlooked something basic, but I haven't come across a function or option that gives the desired result.

3

3 Answers

21
votes

EDIT:

tabular produces frequency tables while table produces contingency tables. However, to get zero frequencies in a one-dimensional contingency table as in the above example, the below code still works, of course.


This question provided the missing link. By converting the Likert item to a factor, and explicitly specifying the levels, levels with a frequency of 0 are still counted

data <- factor(data, levels = c(1:5))
table(data)

produces the desired output

7
votes

table produces a contingency table, while tabular produces a frequency table that includes zero counts.

tabulate(data)
# [1] 3 1 0 2 1

Another way (if you have integers starting from 1 - but easily modifiable for other cases):

setNames(tabulate(data), 1:max(data))  # to make the output easier to read
# 1 2 3 4 5 
# 3 1 0 2 1 
0
votes

If you want to quickly calculate the counts or proportions for multiple likert items and get your output in a data.frame, you may like the function psych::response.frequencies in the psych package.

Lets create some data (note that there are no 9s):

df <- data.frame(item1 = sample(1:7, 2000, replace = TRUE), 
                 item2 = sample(1:7, 2000, replace = TRUE), 
                 item3 = sample(1:7, 2000, replace = TRUE))

If you want to calculate the proportion in each category

psych::response.frequencies(df, max = 1000, uniqueitems = 1:9)

you get the following:

           1      2     3      4      5      6      7 8 9 miss
item1 0.1450 0.1435 0.139 0.1325 0.1380 0.1605 0.1415 0 0    0
item2 0.1535 0.1315 0.126 0.1505 0.1535 0.1400 0.1450 0 0    0
item3 0.1320 0.1505 0.132 0.1465 0.1425 0.1535 0.1430 0 0    0

If you want counts, you can multiply by the sample size:

psych::response.frequencies(df, max = 1000, uniqueitems = 1:9) * nrow(df)

You get the following:

        1   2   3   4   5   6   7 8 9 miss
item1 290 287 278 265 276 321 283 0 0    0
item2 307 263 252 301 307 280 290 0 0    0
item3 264 301 264 293 285 307 286 0 0    0

A few notes:

  • the default max is 10. Thus, if you have more than 10 response options, you'll have issues. Otherwise, in your case, and many Likert item cases, you could omit the max argument.
  • uniqueitems specifies the possible values. If all your values were present in at least one item, then this would be inferred from the data.
  • I think the function only works with numeric data. So if you have your likert categories coded "Strongly disagree", etc. it wont work.