2
votes

I am trying to create a dummy variable from a column variable from an existing data set. The variable I am interested in is a title in this format:

CHEMICALS - Commission Delegated Directive (EU) 2015/863 of 31 March 2015 amending Annex II to Directive 2011/65/EU of the European Parliament and of the Council as regards the list of restricted substances (Text with EEA relevance)

or

Commission Implementing Directive (EU) 2015/2392...

I want to create a dummy variable indicating that the Title is either implementing or delegated. In other words, when the word "delegated" is in my title variable, this will be labeled 1 and everything else will be labeled 0.

Can anyone help me with this? It is very appreciated. So far, I have used this code:

infringements$delegated <- ifelse(infringements$Title=="Delegated", 1, 0)
table(infringements$delegated, infringements$Title)  
summary(infringements$delegated)

When I run the code, I get 0 matches, even though I know that there are 41 matches.

3
Can you provide a minimal data example?Aleksandr
You can use str_detect() from the package stringr instead of == because == will only check if your string is equal to "Delegated" and what you're trying to do is to detect a pattern in your title.Mbr Mbr
Use grepl, i.e. as.integer(grepl('Delegated', infringements$Title))Sotos
great, thank you! I used the grepl suggestion because I have already been working with the grep package, and this worked.reveraert

3 Answers

3
votes

We can do

+(grepl('Delegated', infringements$Title))
2
votes

Use str_detect() from the package stringr

library(stringr)

as.integer(str_detect(infringements$Title,"Delegated"))
1
votes
infringements = data.frame(lapply(data.frame(Title=c("CHEMICALS - Commission Delegated Directive (EU) 2015/863 of 31 March 2015 amending Annex II to Directive 2011/65/EU of the European Parliament and of the Council as regards the list of restricted substances (Text with EEA relevance)","No Text","Text3Delegated")), as.character), stringsAsFactors=FALSE)
infringements$delegated = lapply(infringements$Title, function(x) ifelse(length(grep("Delegated", x))!=0, 1, 0))