Deep Architectures have been proposed in recent years, and experiments on simple tasks have shown much better results compared with existing shallow structure models.
The idea of Deep architecture is that higher level abstract features could be captured, which is believed to be much more robust to the variations in the original feature space.
Through the deep architecture, the effects of different variation factors could be modeled by the layered structure.
The deep architecture is believed to work well on problems that the underlying data distribution can be thought as the product of factor distributions, which means that a sample corresponds to a combination of particular value for these factors.
In this paper, the authors experimented with plenty of factors of variations on the MNIST digit database to compare with different models including shallow SVM models and deep believe network, stacked auto-associators.
The shallow structures:
The DBN structure:
The stacked auto-associator structure:
The results of their experiments:
From the results, most of the time the deep architecture works well with variations, but there are also some cases they are worse than the shallow architectures.
The deep learning algorithms also need to be adapted in order to scale to harder, potentially "real life" problems.
In the talk presented by one of the author, Dumitru Erhan, another set of experiments using Multi-layered kernel machines for comparison, which was proposed by Schoelkopf et al. in 1998 by stacking up kernel-PCAs.
They have shown that the Multi-layer kernel machines work pretty well.
One last thing, their experiments are done using PLearn.