I am trying to cleaning up some data in a huge dataset. One column holds values for the Sales aamount. Example values could be like those:
Clean Data:
Sales Potential
230
120
300
However, at some points there appear something like this:
Dirty Data
0, 0, 0, 0, 0
4, 0, 0, 0
0, 0, 480
0, 200, 0
In the first case of the dirty data the cell shoul only contains a zero: 0
In all other cases I would like to extract, if there is any non-zero number, this number and replace the cell with this value or add a new cleanded-column.
So the dirty data cleaned up:
Cleaned Data:
0
4
480
200
My approach was using RegExpressions in R
as I am loading the data into Power-BI using Power-Query.
I tried to find a pattern where I extract the value I am looking for and place it in a new column. However, my resutls looks like nothing.
Is there maybe a much simpler approach to achieve this in R
?
Code so far:
library(stringr)
OutputRegEx <- data.frame(MyDataset)
Splitter = function(x) substr(str_extract(x,'[1-9]'),1,7)
OutputRegEx[["RegExAuswertung"]] <- apply(OutputRegEx[43],1, function(x) Splitter(x) )