The attached paper models phonetic attributes with CRF models. However, the sentence interests me most is the following one:
As described in  we used the linear output of the MLPs with a KL transform applied to them to decorrelate the features, as this gave the best results for the HMM system.
 H. Hermansky, D. Ellis, and S. Sharma, "Tandem connectionist feature stream extraction for conventional HMM systems", in Proc. of the ICASSP 2000.
Maybe sometime we could also try to refine the posterior features from NN for HMM systems.