I have a data frame with a string of values, with certain anomalous readings I want to identify. I would like to make a third column in my data frame marking certain readings as "anomaly", and the rest as "normal". Looking over a plot of my data, by eye it seems pretty obvious when I get these odd dips, but I am having trouble figuring out how to get R to recognize the odd readings since the baseline average changes over time. The best I can come up with is three rules to use to classify something as "anomaly".
1: Starting with the second value, if the second value is within a close range of the first value, then mark as "N" for normal in the third column. And so on through the rest of the data set.
2: If the second value represents a large increase or decrease from the first value, mark as "A" for anomaly in the third column.
3: If a value is marked as "A", the following value will be marked as "A" as well if it is within a small range the previous anomalous value. If the following value represents a large increase or decrease from the previous anomalous value, it is to be marked as "N".
This was my best logic I could come up with, but looking at the data below if you can come up with a better idea I'm all for it.
So given a dummy data set:
SampleNum<-1:50
Value <- c(1, 2, 2, 2, 23, 22, 2, 3, 2, -23, -23, 4, 4, 5, 5, 25, 24,
6, 7, 6, 35, 38, 20, 21, 22, -22, 2, 2, 6, 7, 7, 6, 30, 31,
6, 6, 6, 5, 22, 22, 4, 5, 4, 5, 30, 39, 18, 18, 19, 18)
DF<-data.frame(SampleNum,Value)
This is how I might see the final data, with a third column identifying which values are anomalous.
SampleNum Value Name
1 1 N
2 2 N
3 2 N
4 2 N
5 23 A
6 22 A
7 2 N
8 3 N
9 2 N
10 -23 A
11 -23 A
12 4 N
13 4 N
14 5 N
15 5 N
16 25 A
17 24 A
18 6 N
19 7 N
20 6 N
21 35 A
22 38 A
23 20 N
24 21 N
25 22 N
26 -22 A
27 2 N
28 2 N
29 6 N
30 7 N
31 7 N
32 6 N
33 30 A
34 31 A
35 6 N
36 6 N
37 6 N
38 5 N
39 22 A
40 22 A
41 4 N
42 5 N
43 4 N
44 5 N
45 30 A
46 39 A
47 18 N
48 18 N
49 19 N
50 18 N