Calculating FFT Features from a discrete signal

Question

I have an array of uniform signal which were sampled at 10Hz (meaning two consecutive data points is 100 milliseconds apart). This actually the magnitude of the 3 axes of a 3d gyroscope, the array contains 30 data points (in 3 seconds). I plot the frequency of this series as follow

import numpy as np
import matplotlib.pyplot as pl

sample_rate = 10
x = np.array([318.45,302.78,316.47,334.14,333.41,326.15,320.07,318.68,314.12,308.64,300.15,304.33,318.42,322.72,329.56,339.18,338.03,343.27,351.44,353.23,352.35,352.88,353.43,352.14,351.28,352.82,353.36,353.35,353.19,353.82])
x = np.array(x) - np.mean(x)
p = np.abs(np.fft.rfft(x))
f = np.linspace(0, sample_rate/2, len(p))
pl.plot(f, p)
pl.show()

Can someone tell me did I plot right, or not? I am planning to calculate the follow features (from above signal)

DC Component
Spectral Energy
Information Entropy
Dominant frequency components
Principal frequency
Magnitude of the first five components of FFT analysis

Can someone help me to fill the above code for the calculating of those features?

----@RoadRunner66: Please see my questions below as I could not post a long reply to you----

Thank you for you answer and your code,

Regarding to your question, the data is from the Gyro scope which measures the Euler angles.

So (sum x[i]**2 : 3357757.0) is the Spectral Energy? If yes then do I need to normalize it by dividing this number by n? (or multiply with n as you did), however the two below papers have difference in their definitions.

As in the first paper (first link below) they stated that "The second frequency-domain feature set was chosen to be spectral energy, which is defined to be the sum of the squared FFT coefficients"

In the second paper (2nd link) they stated in another way that "Spectral energy: the squared sum of spectral coefficients divided by the number of samples in a window"

And what about the Principal frequency, is that the same meaning (term) with Dominant frequency? I guess Principal frequency refers to the only one which has the highest spectrum peak?

I printed the frequencies and the equivalent magnitudes of into two rows like this

I think you printed the magnitude of the first 5 like the yellow bellow. I am not sure about the definition of "First 5 components"

If we use the first consecutive five like you have pointed out, does it make sense to include the ones (like at frequency 0 or 0.666) and feed them into my prediction model (as explained below), because it is too low compared to the others. If the returning spectrum is clear with dominant frequencies like at 1hz and 3hz then maybe the magnitude at frequency of 0.5hz or 1.5hz will be close to zero.

Could it be the term "Magnitude of the first five components of FFT analysis" is the "Magnitude of the first five dominant components" as I highlighted in the blue colors? Does this term refer to 5 values or just 1 values (square root of the sum of the 5 squares) ? In case it refers to the 5 values (very likely it is) then I think the top five dominant components in magnitude will be a better choice when it comes to comparing the difference between two signal?

Btw, the second paper also wrote "First 5-FFT coefficients: the first 5 of the fast-Fourier transform coefficients are taken since they capture the main frequency components, and the use of additional coefficients did not improve the accuracies"

To be frankly, I am working with the problem of cow activities, my strategy is to segment the sensor data into time windows (3,5,7..seconds) and extract features from each window, then feed them to the Machine Learning model.

(My data include a 3d gyroscope and a 3d accelerometer attached to the neck of the cow, the sensors data sampling is 10Hz)

I want to combine two types of features, one is time domain features and the other one is frequency domain features.

I read paper and found those above set of frequency domain features which includes the term "Magnitude of the first five components of FFT analysis" (from this paper https://ieeexplore.ieee.org/abstract/document/4663615 ) and from this one https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634510/

The second one they referred it as "First 5-FFT coefficients: the first 5 of the fast-Fourier transform coefficients are taken since they capture the main frequency components, and the use of additional coefficients did not improve the accuracies."

Thank you so much for your reading and answer!

I think the question of spectral energy is a matter of definition, I just tried to check on the Parseval theorem for equality. Based on the first definition you found it would be 1/n of what I quoted. As for the first 5 components that really just depends on your use case (not a question of python or spectral methods), i.e. what you want to model. If one of the cows has some sort of shaking disease then the highest frequency at 5 Hz could be relevant to detect that, but for the general movements you'll care about low frequency. — roadrunner66
@roadrunner66, can you tell me why we should care about the low frequencies. As low ones normally have low magnitudes too? — Thai
In your case the low frequencies have the highest magnitudes (1/3 Hz,1Hz,2Hz). Just look at the 2nd and 4th plot below. This is also a general feature in nature (1/f noise in electronics, on surface quality in optics, many other examples). If there was significant energy at high frequencies it would actually be a problem. You want to be very low amplitude as you get closer to the Nyquist frequency (fN) or you will have aliasing problems (i.e. you think something is of frequency fN- fo, but it actually is of fN+fo or even higher order. — roadrunner66
Thank you. So in the case of signal sampling rate is 3Hz then I should care about the first 5 magnitude of the frequencies at 0, 1/3, 2/3, 1, 4/3. It that right? Can you explain more about the problem when there was significant energy at high frequencies? Does problem here mean it is not natural property of the electric signal or the problem from the cow (like the cow has disease)? — Thai
No, it means that some movements of the cow happen faster than your 10 Sa/sec sampling rate allows to capture. See en.wikipedia.org/wiki/Aliasing — roadrunner66

roadrunner66 roadrunner66 · Accepted Answer · 2020-01-23T01:59:05

You have done most of the work. This appears to be for a class assignment, so that might explain the sparsity of the data, which are not really a lot to go by.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

sample_rate = 10 #Hz

x = np.array([318.45,302.78,316.47,334.14,333.41,326.15,320.07,318.68,314.12,308.64,
              300.15,304.33,318.42,322.72,329.56,339.18,338.03,343.27,351.44,353.23,
              352.35,352.88,353.43,352.14,351.28,352.82,353.36,353.35,353.19,353.82])
n=len(x)
print(f'number of points n : {n}')
t= np.linspace(0,3, n)     # time according to your description
mn=np.mean(x)
print(f' mean = {mn:.3f} unit')
print(f' sum x[i]**2  : {np.sum(x**2) :.1f} unit^2 ')

pr = np.abs(np.fft.rfft(x))/n
print()
print(f' DC peak :  {pr[0]};  this makes choice of normalization 1/n meaningful')

print(f' n *sum X[k]**2   : {np.sum(np.abs(pr)**2)*n :.1f} unit^2 ') 
f = np.linspace(0, sample_rate/2, len(pr))   # 10 Sa/sec, so 5 Hz is Nyquist limit

plt.figure(figsize=(20,5))

plt.subplot(141)
plt.plot(t,x,'.-')
plt.title('original data') 

plt.subplot(142)
plt.plot(f, pr,'.-')
plt.title('spectrum')


plt.subplot(143)
xf = np.array(x) - mn   #  remove the DC
plt.plot(t, xf,'.-')
plt.title('original data with DC removed')

plt.subplot(144)
pr = np.abs(np.fft.rfft(xf))/n
plt.plot(f, pr, '.-');
plt.title('spectrum (DC removed) ')

The DC peak is the same as the average of the original signal at 334 units.

The spectral energy (see Parseval's Theorem) is the same in time and frequency domain.

I don't know what definition for information entropy is expected for you to be used (see e.g. Approx. entropy).

The dominant frequencies in the spectrum (after removing DC) are at 1/3 Hz, 1 Hz and 2 Hz, with 1/3 Hz being the largest. I printed the magnitude of the first five.

An important question for me would be the physical meaning of the data. Are they angles? If yes, in what unit?

Note that one can 'see' the frequency components in the original data at 1/3 Hz (one wave over 3 secs) , 1 Hz (one wave over 1 sec) and 2 Hz (one wave over 1/2 sec), respectively.

Calculating FFT Features from a discrete signal

1 Answers