0
votes

I'm trying to denoise financial time series data (second by second). I have a very long time series, but I've been working with 100,000 observations just to test how well the wavelet denoising (haar) works. It doesn't.

No matter what I do, the reconstructed signal ends up invariably almost identical to the original. Obviously, I want to preserve the original signal, but I feel like the series just simply isn't being denoised -- a financial time series whose only noise occurs in the few-second resolution? Moreover, even at the smallest time scales, the graph of the reconstructed and original graph remain almost the same.

I've tried changing the mother wavelet, the time series length, the mode in which reconstruction of the time series is done (soft vs hard) and, obviously, I've messed with the threshold value itself. I started at the recommended/standard threshold value of sqrt(2*log(len(signal))), but that did virtually nothing for me, so I gradually increased it until I got to the completely ridiculous 2*len(signal)**2 -- which should have smoothed the graph beyond recognition but did basically nothing.

WAVELET = "haar"
LEVEL = 2 

signal = training_series
mean = signal.mean()
mean_series = [mean] * len(signal)
signal = [a - b for a, b in zip(signal, mean_series)]

coeffs = pywt.wavedec(signal, WAVELET, level=LEVEL)
sigma = mad(coeffs[-LEVEL])
threshold = sigma * np.sqrt(2*np.log(len(signal)))
coeffs[1:] = (pywt.threshold(i, value=threshold, mode="soft" ) for i in coeffs[1:])
reconstructed_signal = pywt.waverec(coeffs, WAVELET)

I expected that the reconstructed signal would be significantly different from the original signal (as in, smoothed out, denoised, less... identical to the original), but that wasn't the case. At the smallest of scales (think every 10 or 20 seconds on a scale of 100,000 seconds), there is some very minor smoothing that is essentially just ignoring peaks and valleys of size 0.01 (the smallest possible change), but it's almost negligible.

I expected a signal that would be, well, I don't know -- denoised? Am I doing something wrong?

2

2 Answers

0
votes

Your threshold might be too high.

You should try setting it by a metric based on the detail coefficients at each level, instead of the original time trace. Usually starting at:

threshold=np.std(coeff[i])

and going from there will at least get one started.

0
votes

I had the same problem and found by steadily increasing a scale factor on the threshold helped.

I was attempting to denoise an acoustic emission signal, and only got reconstruction. By multiplying sigma by an increasing scale factor I could find out how high the thresholds needed to be to stop reproducing the signal.

import pywt
import numpy as np
import matplotlib.pyplot as plt

def madev(d, axis=None):
    """ Mean absolute deviation of a signal """
    return np.mean(np.absolute(d - np.mean(d, axis)), axis)

def wavelet_denoising(x, wavelet, level, s_factor):
    """ 
    deconstructs, thresholds then reconstructs
    higher thresholds = less detailed reconstruction
    """
    coeff = pywt.wavedec(x, wavelet, mode="per")
    sigma = (1/0.6745) * madev(coeff[-level])*s_factor
    uthresh = sigma * np.sqrt(2 * np.log(len(x)))
    coeff[1:] = (pywt.threshold(i, value=uthresh, mode='hard') for i in coeff[1:])
    return pywt.waverec(coeff, wavelet, mode='per')

wav = 'db4'
level=1
for s_factor in np.arange(0,20, 2):

    data = wavelet_denoising(signal, wav, level, s_factor)
    plt.plot(data)
    plt.title('scale factor = {}'.format(s_factor))
    fname = 'wavelet_{}_sf_{}_n_{}'.format(wav, s_factor, len(signal))
    plt.savefig(fname)
    plt.show()