I have a dataframe with a column of strings and want to extract substrings of those into a new column.
Here is some sample code and data showing I want to take the string after the final underscore character in the id
column in order to create a new_id
column.
The id
column entry always has 2 underscore characters and it's always the final substring I would like.
df = data.frame( id = I(c("abcd_123_ABC","abc_5234_NHYK")), x = c(1.0,2.0) )
require(dplyr)
df = df %>% dplyr::mutate(new_id = strsplit(id, split="_")[[1]][3])
I was expecting strsplit to act on each row in turn.
However, the new_id
column only contains ABC
in each row, whereas I would like ABC
in row 1 and NHYK
in row 2. Do you know why this fails and how to achieve what I want?
strsplit
. the[[1]]
always grabs the first element of the list. – Lloyd Christmasdf$newVar <- sub(".*_([A-Z]+)$", "\\1", df$id)
. – lmo