Thursday, August 5, 2010

[NN/HMM] Introducing Phonetically Motivated, Heterogeneous Information into Automatic Speech Recognition

Phonetically motivated experts are investigated for multi-stream automatic speech recognition.

The two experts adopted are:
1) {vowel, consonant, nasal, liquid, silence}
2) {voiced, unvoiced, silence}

The basic system is a MLP to predict phone posteriors and two settings are used: one is full-band and the other is multi-band.

The fusion of the original model and the expert system is done by simply multiplying them together.

The two experimental settings are displayed below:

Posted via email from Troy's posterous

1 comment:

  1. However, in this paper, they are doing recognition on digits, and totally about 32 words. The problem of using NN in the Large Vocabulary ASR is not addressed.