In the previous work, the authors developed a detector based high performance phone recognizer. Articulatory informations are extracted using a bunch of speech feature detectors implemented by MLP. A final event merger, another MLP, combines those different detectors to predict phoneme posteriors, which are used as HMM's emission probabilities for decoding.
In this paper, the detector based phoneme recognizer is extended to LVCSR. With the state-of-the-art HMM based speech recognizer, word lattices are generated. The the high quality monophone posteriors generated by the detector based recognizer is utilized to rescore the lattices for second stage decoding.
Comparing with standard MLE and MMI trained HMM systems, the rescored lattices yield lower WER on WSJ0 corpus.
The system structure is illustrated in the figure below: