0
votes

I would like to recode a continuous variable based on predefined intervals and categorize the variables. I know an ifelse would work but it seems to me very tedious. Here how my sample dataset looks like:

library(ggplot2)

df <- data.frame(
  id = c(1,2,3,4,5),
  score = c(1.5, 2, 2.5, 3.2, 4.5))

> df
  id score
1  1   1.5
2  2   2.0
3  3   2.5
4  4   3.2
5  5   4.5

here my predefined intervals:

intervals <- cut_interval(seq(1, 7), n=5)
> intervals
[1] [1,2.2]   [1,2.2]   (2.2,3.4] (3.4,4.6] (4.6,5.8] (5.8,7]   (5.8,7]  
Levels: [1,2.2] (2.2,3.4] (3.4,4.6] (4.6,5.8] (5.8,7]

With these intervals My desired/recoded dataset should look like this:

df <- data.frame(
  id = c(1,2,3,4,5),
  score = c(1.5, 2, 2.5, 3.2, 4.5),
  intervals = c("[1,2.2]", "[1,2.2]", "(2.2,3.4]", "(2.2,3.4]", "(3.4,4.6]"),
  cat = c(1,1,2,2,3))

> df
  id score intervals cat
1  1   1.5   [1,2.2]   1
2  2   2.0   [1,2.2]   1
3  3   2.5 (2.2,3.4]   2
4  4   3.2 (2.2,3.4]   2
5  5   4.5 (3.4,4.6]   3

Any thoughts? Thanks!

2

2 Answers

0
votes

It sounds like you would just need to retrieve the cut points from intervals and apply them to df$score. In base R you could do

df$intervals <-
  cut(df$score, breaks = unique(as.numeric(
    gsub("[\\(|\\]|\\[|\\]|\\)]", "", unlist(strsplit(levels(
      cut(seq(1, 7), breaks = 5)
    ), ",")))
  )))
df$cat <- as.numeric(df$intervals)
0
votes

Here's some possibly relevant oneliners with (my) santoku package:

library(santoku)

# cut into equal-sized intervals of size 1.2
df$cat <- chop_width(df$score, 1.2) 

# cut into exactly 5 intervals of equal width
df$cat <- chop_evenly(df$score, 5)