0
votes

In my first column I have numeric identifiers and the second column is a character column that, for example, identifies the subject's favorite sports.

X1       X2
001      NBA
001      MLS
001      MLB
002      UFC
002      NFL
002      NHL
002      NBA
003      MLB
003      NBA

I have thousands of data points like this and I want the output to be able to show me the unique values in column 2 (X2) if the value in column 1 (X1) is equal to 001 or 002 or 003.

3
Can you show your expected output? Do you need aggregate(X2~X1, df, function(x) toString(unique(x))) ? - Ronak Shah
Please explain what "equal to 001 or 002 or 003" means as I just posted an answer with a result for each of the three whilst StupidWolf gave one for the logic "or" as in "is equal to any of 001, 002 or 003" Which of these did you mean? - Bernhard

3 Answers

0
votes

Your dataframe:

df = structure(list(X1 = c("001", "001", "001", "002", "002", "002", 
"002", "003", "003"), X2 = structure(c(3L, 2L, 1L, 6L, 4L, 5L, 
3L, 1L, 3L), .Label = c("MLB", "MLS", "NBA", "NFL", "NHL", "UFC"
), class = "factor")), row.names = c(NA, -9L), class = "data.frame")

To get unique across all X2 with X1 in 001,002,003 :

unique(df$X2[df$X1 %in% c("001","002","003")])
[1] NBA MLS MLB UFC NFL NHL

To get unique X2 within X1s:

unique(df[df$X1 %in% c("001","002","003"),])
   X1  X2
1 001 NBA
2 001 MLS
3 001 MLB
4 002 UFC
5 002 NFL
6 002 NHL
7 002 NBA
8 003 MLB
9 003 NBA
0
votes
d <- read.table(header=TRUE, text="X1       X2
001      NBA
001      MLS
001      MLB
002      UFC
002      NFL
002      NHL
002      NBA
003      MLB
003      NBA")

tapply(d$X2, d$X1, unique)

gives a list of length three:

> str(tapply(d$X2, d$X1, unique))
List of 3
 $ 1: chr [1:3] "NBA" "MLS" "MLB"
 $ 2: chr [1:4] "UFC" "NFL" "NHL" "NBA"
 $ 3: chr [1:2] "MLB" "NBA"
 - attr(*, "dim")= int 3
 - attr(*, "dimnames")=List of 1
  ..$ : chr [1:3] "1" "2" "3"
0
votes

If the data was like this, for example, where X3 is a data frame containing the information in X1 and X2.

X1 <- c(001, 001, 001, 002, 002, 002)
X2 <- c("NBA", "NBA", "NHL", "NBA", "NHL", "NHL")
X3 <- data.frame(X1, X2)

Just filter by what you want X1 to equal and then use distinct(.keep_all = TRUE) to keep all the unique values to generate a data frame of all the unique values in X2 based off a value in X1.

X3 %>% 
  filter(X1 == 001) %>% 
  distinct(.keep_all = TRUE)