Why use a restricted Boltzmann machine rather than a multi-layer perceptron?

Question

I'm trying to understand the difference between a restricted Boltzmann machine (RBM), and a feed-forward neural network (NN). I know that an RBM is a generative model, where the idea is to reconstruct the input, whereas an NN is a discriminative model, where the idea is the predict a label. But what I am unclear about, is why you cannot just use a NN for a generative model? In particular, I am thinking about deep belief networks and multi-layer perceptrons.

Suppose my input to the NN is a set of notes called x, and my output of the NN is a set of nodes y. In a discriminative model, my loss during training would be the difference between y, and the value of y that I want x to produce (e.g. ground truth probabilities for class labels). However, what about if I just made the output have the same number of nodes as the input, and then set the loss to be the difference between x and y? In this way, the network would learn to reconstruct the input, like in an RBM.

So, given that a NN (or a multi-layer perceptron) can be used to train a generative model in this way, why would you use an RBM (or a deep belief network) instead? Or in this case, would they be exactly the same?

m7thon m7thon · Accepted Answer · 2015-08-07T01:33:10

You can use a NN for a generative model in exactly the way you describe. This is known as an autoencoder, and these can work quite well. In fact, these are often the building blocks of deep belief networks.

An RBM is a quite different model from a feed-forward neural network. They have connections going both ways (forward and backward) that have a probabilistic / energy interpretation. You'll need to read the details to understand.

A deep belief network (DBN) is just a neural network with many layers. This can be a large NN with layers consisting of a sort of autoencoders, or consist of stacked RBMs. You need special methods, tricks and lots of data for training these deep and large networks. Simple back-propagation suffers from the vanishing gradients problem. But if you do manage to train them, they can be very powerful (encode "higher level" concepts).

Hope this helps to point you in the right directions.

Why use a restricted Boltzmann machine rather than a multi-layer perceptron?

1 Answers