Deep Learning and Time Series: Varying Vector Lengths

Question

This is a conceptual question about working with time series of various lengths in a deep learning context:

I have observations of standardized features that occur at irregular intervals which include a time based feature in every individual measurement. I then flatten this multivariate time series (panel data) to a single continuous feature vector for each time series. I then build a deep neural network for a binary classification task on these vectors which now look like this:

xxxx(T=2)xxxx(T=4)xxxx(T=5)
xxxx(T=1)xxxx(T=2)
xxxx(T=3)
xxxx(T=1)xxxx(T=2)xxxx(T=3)xxxx(T=5)

And are then end padded with zeros to be the same length.

Each "xxxxT" represents an observation where "x"'s are non-temporal features and "T" is a time based feature. My question is whether it can be assumed that the neural network will be able to discriminate the irregular nature of this time series on its own?

Or should I really pad the intermittent non-occurring observations to look something like this (where the "0000" represent padding the missing observations)?

0000(T=1)xxxx(T=2)0000(T=3)xxxx(T=4)xxxx(T=5)
xxxx(T=1)xxxx(T=2)0000(T=3)0000(T=4)0000(T=5)
0000(T=1)0000(T=2)xxxx(T=3)0000(T=4)0000(T=5)
xxxx(T=1)xxxx(T=2)xxxx(T=3)0000(T=4)xxxx(T=5)

I have actually done this already and examined the results of both approaches. I just wanted to see if anyone could shed some light on how a deep neural network "interprets" this?

vega vega · Accepted Answer · 2017-03-25T17:41:59

If you are using a recurrent net, I don't think it's a good idea to pad inside the sequence. Given the weights are reused across time, it should be "sensitive" to "real" data, not padding.

With end-of-sequence padding, we get around this "fake data" by passing the sequence lengths to the recurrent function, such as with TensorFlow's tf.nn.dynamic_rnn method which has "sequence_length" argument. This way the last "real" hidden state of the sequence is copied through time.

If you insist on intra-padding, you would need an implementation that copies hidden states within a sequence as with end-of-sequence padding.

Right?

Deep Learning and Time Series: Varying Vector Lengths

1 Answers