2
votes

I have AR(1) model with data samples $N=500$ that is driven by a random input sequence x. THe observation y is corrupted with measurement noise $v$ of zero mean. The model is

y(t) = 0.195y(t-1) + x(t) + v(t) where x(t) is generated as randn(). I am unsure how to represent this as a state space model and how to estimate the parameters $a$ and the states. I tried the state space representation would be

d(t) = \mathbf{a^T} d(t) + x(t)

y(t) = \mathbf{h^T}d(t) + sigma*v(t)

sigma =2. I cannot understand how to perform parameter and state estimation. Using the toolbox mentioned below, I checked the Equations of KF to be matching with those in textbooks. However, the approach for parameter estimation is different. I will appreciate a recommendation for the implementation procedure.

Implementation 1: I am following the implementation here : Learning Kalman Filter. This implementation does not use Expectation Maximization to estimate the parameters of AR model and it finds out the Covariance of the process noise. In my case, I don't have a process noise, but an input $x$.

Implementation 2: Kalman Filter by Kevin Murphy is another toolbox which uses EM for parameter estimation of AR model. Now, it is confusing since both the implementations uses different approach for parameter estimation. I am having a tough time figuring out the correct approach, the state space model representation and the code. Shall appreciate recommendations on how to proceed.

I ran the first implementation for the KalmanARSquareRoot technique and the result is completely different. There is Exponential Moving Average Smoothing being performed and a MA filer of length 30 being used. The toolbox runs fine if I run the Demo examples. But on changing the model, the result is extremely poor. Maybe I am doing something wrong. Do I need to change the equations of KF for my time series?

In the second implementation, I cannot figure out what and where to change the Equations.

In general, if I have to use these tools, then do I need to change the KF equations for every time series model? How do I write the Equations myself if these toolboxes are inappropriate for all the time series model - AR, MA, ARMA?

1

1 Answers

2
votes

I only have a bit of experience with Kalman Filters, so take this with a grain of salt.

It seems you shouldn't need to change the equations at all. Working with the second package (learn_kalman), you can create an A0 matrix of size [length(d(t)) length(d(t))]. C0 is the same, and in your case the initial state probably makes sense to be the Identity matrix (unlike your A0. All you need to do is choose a good initial condition.

However, I took a look at your system (plotted an example) and it seems noise dominates your system. KF is an optimal estimator but I have not known it to reject that much noise. It only guarantees a reduced covariance...which means that if your system is mostly dominated by noise, you will calculate a bad model that estimates your system given the noise!

Try plotting [d f] where d is the initial data and f is calculated using the regressive formula f(t) = C * A * f(t-1) :

f(t) = A * f(t-1) ; y(t) = C * f(t)

That is, pretend as if there is no noise but using the estimated AR model. You will see that it rejects all the noise and 'technically' models the system well (since the only unique behaviour is at the very beginning).

For example, if you have a system with A = 0.195, Q=R=0.l then you will converge to an A = 0.2207 but it still isn't good enough. Here the problem is that your initial state is so low, that within a few steps of data and you are essentially at 0 accounting for noise. Naturally KF can converge to a LOT of model solutions that are similar. Any noise will throw off even the best initial condition.

If you increase the resolution of your data in some way (e.g. larger initial condition, more refined timesteps) you will see a good fit. Ex, changing your initial condition to 110 and you'll find the two curves similar, though the model is still fairly different.

I am not aware of any approach to model your data well. If the noise variance is in fact 1 and your system converges to 0 that quickly, it seems doomed to not be effectively modelled since you just don't capture any unique behaviour in the dataset.