This paper explored the CD phone recognition on TIMIT using hybrid NN/HMM system.
1) Using two nets in tandem, one for CI posteriors and the other modeling the contexts from the CI posteriors;
2) Directly train a NN for CD state posteriors, too many outputs and not robust;
3) The first net is to give bottleneck features and then use another net on top of it.
The best results on TIMIT is 21.24% on core test set, there is less than 1% difference from the DBN based monophone recognition, thus rendering the context gain not significant. And also using DBN, the current best results is around 19%.