I'm working with a large dataset (~1500 rows), and when I built the dataset, I didn't think ahead about separating my identifiers, so they're lumped into one long string.
The identifying string is in a column labeled "Polygon_Name". I'd like to keep this column, and split the string values in this column into 3 additional columns.
So for example, if any "Polygon_Name" cell has a number embedded in it, such as Canker14B, I'd like to end up with the following columns: (1) the original Polygon_Name, (2) all text before the number, (3) the number, (4) all text after the number.
Small subset of my data:
df <- structure(list(Bolt_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "N1T.3.4.15.0.C", class = "factor"),
Polygon_Name = structure(c(10L, 1L, 9L, 6L, 3L, 7L, 2L, 8L,
4L, 5L), .Label = c("C", "Canker15B", "Canker15Left", "Canker15Right",
"Canker16", "Canker17", "CankS15B", "CankS16", "CankS17",
"S"), class = "factor"), Measure = c(19.342, 25.962, 0.408,
0.008, 0.074, 0.41, 0.011, 0.251, 0.056, 0.034)), .Names = c("Bolt_ID",
"Polygon_Name", "Measure"), row.names = c(1L, 2L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L), class = "data.frame")
Current output:
Ultimate output (I built this manually):
I've figured out how to extract the number with the following code:
library(stringr)
regexp <- "[[:digit:]]+"
df$Poly_Num <- str_extract(df$Polygon_Name, regexp)
But I'm still struggling to pull out the text before and after the number. Any thoughts would be appreciated.