Identify important minima and maxima in time-series w/ Mathematica

Question

I need a way to identify local minima and maxima in time series data with Mathematica. This seems like it should be an easy thing to do, but it gets tricky. I posted this on the MathForum, but thought I might get some additional eyes on it here.

You can find a paper that discusses the problem at: http://www.cs.cmu.edu/~eugene/research/full/compress-series.pdf

I've tried this so far…

Get and format some data:

data = FinancialData["SPY", {"May 1, 2006", "Jan. 21, 2011"}][[All, 2]];
data = data/First@data;
data = Transpose[{Range[Length@data], data}];

Define 2 functions:

First method:

findMinimaMaxima[data_, window_] := With[{k = window},
  data[[k + Flatten@Position[Partition[data[[All, 2]], 2 k + 1, 1],  x_List /;  x[[k + 1]] < Min[Delete[x, k + 1]] || x[[k + 1]] > Max[Delete[x, k + 1]]]]]]

Now another approach, although not as flexible:

findMinimaMaxima2[data_] := data[[Accumulate@(Length[#] & /@ Split[Prepend[Sign[Rest@data[[All, 2]] - Most@data[[All, 2]]], 0]])]]

Look at what each the functions does. First findMinimaMaxima2[]:

minmax = findMinimaMaxima2[data];
{Length@data, Length@minmax}
ListLinePlot@minmax

This selects all minima and maxima and results (in this instance) in about a 49% data compression, but it doesn't have the flexibility of expanding the window. This other method does. A window of 2, yields fewer and arguably more important extrema:

minmax2 = findMinimaMaxima[data, 2];
{Length@data, Length@minmax2}
ListLinePlot@minmax2

But look at what happens when we expand the window to 60:

minmax2 = findMinimaMaxima[data, 60];
ListLinePlot[{data, minmax2}]

Some of the minima and maxima no longer alternate. Applying findMinimaMaxima2[] to the output of findMinimaMaxima[] gives a workaround...

minmax3 = findMinimaMaxima2[minmax2];
ListLinePlot[{data, minmax2, minmax3}]

, but this seems like a clumsy way to address the problem.

So, the idea of using a fixed window to look left and right doesn't quite do everything one would like. I began thinking about an alternative that could use a range value R (e.g. a percent move up or down) that the function would need to meet or exceed to set the next minima or maxima. Here's my first try:

findMinimaMaxima3[data_, R_] := Module[{d, n, positions},
  d = data[[All, 2]];
  n = Transpose[{data[[All, 1]], Rest@FoldList[If[(#2 <= #1 + #1*R && #2 >= #1) || (#2 >= #1 - #1* R && #2 <= #1), #1, #2] &, d[[1]], d]}];
  n = Sign[Rest@n[[All, 2]] - Most@n[[All, 2]]];
  positions = Flatten@Rest[Most[Position[n, Except[0]]]];
  data[[positions]]
  ]

minmax4 = findMinimaMaxima3[data, 0.1];
ListLinePlot[{data, minmax4}]

This too benefits from post processing with findMinimaMaxima2[]

ListLinePlot[{data, findMinimaMaxima2[minmax4]}]

But if you look closely, you see that it misses the extremes if they go beyond the R value in several positions - including the chart's absolute minimum and maximum as well as along the big moves up and down. Changing the R value shows how it misses the top and bottoms even more:

minmax4 = findMinimaMaxima3[data, 0.15];
ListLinePlot[{data, minmax4}]

So, I need to reconsider. Anyone can look at a plot of the data and easily identify the important minima and maxima. It seems harder to get an algorithm to do it. A window and/or an R value seem important to the solution, but neither on their own seems enough (at least not in the approaches above).

Can anyone extend any of the approaches shown or suggest an alternative to identifying the important minima and maxima?

Happy to forward a notebook with all of this code and discussion in it. Let me know if anyone needs it.

Thank you, Jagra

The alternating maxima and minima criteria is not always important in price variations. Like here i.imgur.com/nsIK7.png — Dr. belisarius
Allow me to welcome you to StackOverflow and remind three things we usually do here: 1) As you receive help, try to give it too answering questions in your area of expertise 2) Read the FAQs 3) When you see good Q&A, vote them upusing the gray triangles, as the credibility of the system is based on the reputation that users gain by sharing their knowledge. Also remember to accept the answer that better solves your problem, if any, by pressing the checkmark sign — Dr. belisarius

Leonid Shifrin Leonid Shifrin · Accepted Answer · 2011-01-26T20:25:12

I suggest to use an iterative approach. The following functions are taken from this post, and while they can be written more concisely without Compile, they'll do the job:

localMinPositionsC = 
 Compile[{{pts, _Real, 1}}, 
   Module[{result = Table[0, {Length[pts]}], i = 1, ctr = 0}, 
    For[i = 2, i < Length[pts], i++, 
     If[pts[[i - 1]] > pts[[i]] && pts[[i + 1]] > pts[[i]], 
      result[[++ctr]] = i]];
    Take[result, ctr]]];

localMaxPositionsC = 
  Compile[{{pts, _Real, 1}}, 
    Module[{result = Table[0, {Length[pts]}], i = 1, ctr = 0}, 
      For[i = 2, i < Length[pts], i++, 
        If[pts[[i - 1]] < pts[[i]] && pts[[i + 1]] < pts[[i]], 
          result[[++ctr]] = i]];
       Take[result, ctr]]];

Here is your data plot:

dplot = ListLinePlot[data]

Here we plot the mins, which are obtained after 3 iterations:

mins = ListPlot[Nest[#[[localMinPositionsC[#[[All, 2]]]]] &, data, 3],
   PlotStyle -> Directive[PointSize[0.015], Red]]

The same for maxima:

maxs = ListPlot[Nest[#[[localMaxPositionsC[#[[All, 2]]]]] &, data, 3],
   PlotStyle -> Directive[PointSize[0.015], Green]]

And the resulting plot:

Show[{dplot, mins, maxs}]

enter image description here

You may vary the number of iterations, to get more coarse-grained or finer minima/maxima.

Edit:

actually, I just noticed that a couple of points were still missed by this method, both for the minima and maxima. So, I suggest it as a starting point, not as a complete solution. Perhaps, you could analyze minima/maxima, coming from different iterations, and sometimes include those from a "previous", more fine-grained one. Also, the only "physical reason" that this kind of works, is that the nature of the financial data appears to be fractal-like, with several distinctly different scales. Each iteration in the above Nest-s targets a particular scale. This would not work so well for an arbitrary signal.

Identify important minima and maxima in time-series w/ Mathematica

1 Answers