When training NN, the training process is usually controlled by the frame accuracy (or frame error rate). However, it is not directly related to the speech recognition performance, i.e. PER or WER.
One way is to do decoding after each time the network weights are updated. For phoneme recognition, it is fine, as the decoding doesn't take too much time. When coming to word recognition, the decoding is quite time consuming. To speed up, one possible way is to using lattices instead a full decoding.
Invoke HDecode with "-w" without language model, it will run in lattice rescoring mode (of course, you need set the input lattices parameters). But where to get the lattices? My setup is as follows:
1) Using HDecode and HMM system to generate lattices using bigram LM ( can also use higher order LM);
2) Using HLRescore to prune the lattices to word networks ( with -m f/b, and save the new lattices with -w);
3) Using HDecode to rescore the lattices with new acoustic model or new posterior features ( in NN case).
http://www.ee.ucla.edu/~weichu/htkbook/node57_ct.html
ReplyDelete