Phonetically motivated experts are investigated for multi-stream automatic speech recognition.
The two experts adopted are:
1) {vowel, consonant, nasal, liquid, silence}
2) {voiced, unvoiced, silence}
The basic system is a MLP to predict phone posteriors and two settings are used: one is full-band and the other is multi-band.
The fusion of the original model and the expert system is done by simply multiplying them together.
The two experimental settings are displayed below:
However, in this paper, they are doing recognition on digits, and totally about 32 words. The problem of using NN in the Large Vocabulary ASR is not addressed.
ReplyDelete