0
votes

I have a binary time series data set with on/off data. The on is usually short lived hence looks like a peak. This is how it looks.

enter image description here

I have detected the peaks and extracted time intervals between the peaks and have data for it too (small red 2way arrows at the bottom). The issue is that, as can be seen, the peaks are clustered and I would want to have quantification regarding the burst size (number of peaks in a cluster), interburst interval (distance between the last peak of the first cluster and first peak of the last cluster), no. of bursts, etc.

All this is easy to do once the clusters are identified. This can be easily done by thresholding the interpeak interval to be greater than some value. But all of my data doesn't have such well-defined clusters, and the interburst interval varies largely. Some of the datasets do not even have clusters. So my main issue here would be to identify clusters based on some automated and relative (not fixed) thresholding.

Could someone please help me with an algorithm for the same.

1
Do your homework, and read up on e.g. kernel density estimation, even detection etc. Try to formalize your notion of clusters.Has QUIT--Anony-Mousse

1 Answers

0
votes

The answer to your question is: No. No one can (yet) help you with an algorithm for what you want.

The problem is that you don't have anything well quantified. You're asking for a reliable algorithm that can identify clusters, when you can't identify what a cluster is.

I wrote a previous answer that recommended you look at the ratio from one peak to the next. If the ratio is above a certain threshold, then it's an inter-cluster gap, otherwise it's an intra-cluster gap. That can work, but it does still have a threshold.

The problem is - you need one. You can't just eyeball each graph and say "Oh, there's a cluster." If you don't define a cluster, you can't identify one. There are ways to make your thresholds more generic; the ratio is one of the simpler ways that lets you avoid scaling issues, and is generally effective. You could look at rolling averages. There are all sorts of ways to play with your data, but somewhere in there, you have to define what you want. Even if you trained some artificial intelligence, you should ideally be doing it with a fixed criteria as to what's a cluster and what isn't. And once you have the fixed criteria, you don't need artificial intelligence.

So, define a cluster. Once you can quantify what a cluster means to you, you can work on making an algorithm for it.

Start by answering these questions:

  • How many peaks at a minimum are needed to define a cluster?
  • Is there a minimum or maximum time between peaks that makes it not a cluster? How about a minimum or maximum time that's relative to the entire time of the dataset?
  • Is there a minimum distance between clusters that makes it two instead of one?

If it helps, look at simplified plots like these to help you come up with your answers. Can you define a cluster for each of these?

..||.|.|.|.||

|.|.|.|.|.|.|

||..||..||..|

||....||....|

|...||||.....