10
votes

I have 20 signals (time-courses) in group A and 20 signals in group B. I want to find a measure to show that group A is different from group B. For example, I ran xcorr for the signals within each group. But now I need to compare them somehow. I tried to take a maximal amplitude of each xcorr pair, which is sort a measure of maximal similarity. Then I compared all these values between two groups, but there was no difference. What else can I do? I can also compare frequency spectrum, but then I again do not know what frequency bin to take. Any suggestions / references are highly appreciated!

I have about 20 signals in each group. Those are my samples. I do not know a-prirori what might be the difference. Here I bring the 9 sample signals for each group, their auto-correlation and cross-correlation for a subset of signals (group 1 vs. group 1, group 2 vs. group 2, group 1 vs. group 2). I do not see any evident difference. I also do not understand how you propose to compare cross-correlations, what peaks should I take? All the signals were detrended and z-scored.

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

2
Could you give us an idea of the number of samples in each waveform? Is there any obvious periodicity? The autocorrelation of a signal with itself (power spectrum) is usually a good indication of the kinds of signals that are present. And the cross correlation between different signals in A (which are "similar") may be higher than the corresponding AB correlations. In what way do you expect them to differ? Ultimately the test should be: take two random signals (may be from A, may be from B). Perform test. If test value < something -> Same, else -> different? Show statistical difference.Floris
Thank you for the answer!user1597969
Here I added the details to my original post.user1597969
You said you detrended and z-scored them. Perhaps the slope of the trend line, or the mean and variance of the signal are what you need. Right now they all just look like noise.David Wurtz
I must to detrend because the trend is introduced by the measurement device. When I compare simple std between two groups it does not differ either. Clearly, the signals are similar - otherwise I would have been too simple:) The question what else measures I can try?user1597969

2 Answers

17
votes

Well, this may be too simplistic of an answer, and too complex of a measure, but maybe its worth something.

In order to compare signals, we really have to establish some criterion by which we compare them. This could be so many things. If we want signals that look visually similar, we perform time domain analysis. If we are talking about audio signals that sound similar, we care about frequency or time-frequency analysis. If the signals are supposed to represent noise, then signal variance should be a good measure. In general we may want to use a combination of all sorts of measures. We can do this with a weighted index.

First let's establish what we have: there are two sets of signals: set A and set B. We want some measure that shows set A is different from set B. The signals are detrended.

We take signal a in A and signal b in B. The list of things we can compare:

  • Similarity in time domain (static): Multiply in place and sum.

  • Similarity in time domain (with shift*): Take fft of each signal, multiply, and ifft. (I believe this equivalent to matlab's xcorr.)

  • Similarity in frequency domain (static**): Take fft of each signal, multiply, and sum.

  • Similarity in frequency domain (with shift*): Multiply the two signals and take fft. This will show if the signals share similar spectral shapes.

  • Similarity in energy (or power if different lengths): Square the two signals and sum each (and divide by signal length for power). (Since the signals were detrended, this should be signal variance.) Then subtract and take absolute value for a measure of signal variance similarity.

* (with shift) -- You could choose to sum over the entire correlation vector to measure total general correlation, you could choose to sum only values in the correlation vector that surpass a certain threshold value (as if you expect echoes of one signal in the other), or just take the maximum value from the correlation vector (where its index is the shift in the second signal that results in maximal correlation with the first signal). Also, if the amount of shift that it takes to reach maximal correlation is important (i.e. if signals are similar only if it takes relatively small shift to reach the point of maximal correlation), then you can incorporate a measure of the index displacement.

** (frequency domain similarity) -- You may want to mask part of the spectrum that you're not concerned with, for instance, if you only care about the more high frequency structures (fs/4 and up), you could do:

mask = zeros(1,n); mask(n/4):
freq_static = mean(fft(a) .* fft(b) .* mask);

Also, we may want to implement a circular correlation like so:

function c = circular_xcorr(a,b)
c = xcorr(a,b);
mid = length(c) / 2;
c = c(1:mid) + c(mid+1:end);
end

Finally, we choose the characteristics that are important or relevant, and create a weighted index. Example:

n = 100;
a = rand(1,n); b = rand(1,n);
time_corr_thresh = .8 * n; freq_corr_thresh = .6 * n;
time_static = max(a .* b);
time_shifted = circular_xcorr(a,b);    time_shifted = sum(time_shifted(time_shifted > time_corr_thresh));
freq_static = max(fft(a) .* fft(b));
freq_shifted = fft(a .* b);     freq_shifted = sum(freq_shifted(freq_shifted > freq_corr_thresh));
w1 = 0; w2 = 1; w2 = .7; w3 = 0;
index = w1 * time_static + w1 * time_shifted + w2 * freq_static + w3 * freq_shifted;

We compute this index for each pair of signals.

I hope that this outline of signal characterization helps. Comment if anything is unclear.

1
votes

With reference to Brian's answer above, I've written a Python Function to compute the similarity of time-series signal as below;

def compute_similarity(ref_rec,input_rec,weightage=[0.33,0.33,0.33]):
    ## Time domain similarity
    ref_time = np.correlate(ref_rec,ref_rec)    
    inp_time = np.correlate(ref_rec,input_rec)
    diff_time = abs(ref_time-inp_time)

    ## Freq domain similarity
    ref_freq = np.correlate(np.fft.fft(ref_rec),np.fft.fft(ref_rec)) 
    inp_freq = np.correlate(np.fft.fft(ref_rec),np.fft.fft(input_rec))
    diff_freq = abs(ref_freq-inp_freq)

    ## Power similarity
    ref_power = np.sum(ref_rec**2)
    inp_power = np.sum(input_rec**2)
    diff_power = abs(ref_power-inp_power)

    return float(weightage[0]*diff_time+weightage[1]*diff_freq+weightage[2]*diff_power)