I want to use deep learning techniques to perform better inference tasks than Hidden Markov Models (which is a shallow model)? I was wondering what is the state-of-the art deep learning model to replace Hidden Markov Models (HMM)? The set-up is semi-supervised. The training data X(t),Y(t) is a time series, with significant temporal correlations. Also, there is a huge amount of unlabelled data, i.e., simply X(t) and no Y(t). After reading many papers, I narrowed down on the following model -> Conditionally Restricted Boltzmann Machines (Ilya Sustkever MS thesis) and use Deep Belief Networks for unsupervised pretraining (or use variational autoencoders for pretraining). I am very new to the field, and was wondering if these techniques are outdated.
1 Answers
"I was wondering what is the state-of-the art deep learning model to replace Hidden Markov Models (HMM)"
At the moment RNN (Recurrent Neural Network) and LSTM (Long Short Term Memory) based DNNs are state of the art. They are the best for a lot of sequencing problems starting from Named Entity Recognition (https://www.quora.com/What-is-the-current-state-of-the-art-in-Named-Entity-Recognition-NER/answer/Rahul-Vadaga), Parsing (https://arxiv.org/pdf/1701.00874.pdf) to Machine Translation (https://arxiv.org/pdf/1609.08144.pdf). These DNNs are also called sequence models (e.g. seq2seq where input as well as output is a sequence like Machine Translation)
"unsupervised pretraining"
The pre-training is not that popular any more (for supervised ML problems) since you can achieve the same results using random restarts with parallelization as you have more (and cheaper) CPUs now.
[Added the following later]
A recent paper (Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks by Nils Reimers, and Iryna Gurevych) does a good comparison of various seq2seq for common NLP tasks: https://arxiv.org/pdf/1707.06799.pdf
Definitely worth a read.