Dream & Passion: [NN/HMM] Introducing Phonetically Motivated, Heterogeneous Information into Automatic Speech Recognition

Thursday, August 5, 2010

Phonetically motivated experts are investigated for multi-stream automatic speech recognition.

The two experts adopted are:

1) {vowel, consonant, nasal, liquid, silence}

2) {voiced, unvoiced, silence}

The basic system is a MLP to predict phone posteriors and two settings are used: one is full-band and the other is multi-band.

The fusion of the original model and the expert system is done by simply multiplying them together.

The two experimental settings are displayed below:

fulltext (1).pdf (1633 KB)

AnonymousAugust 5, 2010 at 12:44 PM
However, in this paper, they are doing recognition on digits, and totally about 32 words. The problem of using NN in the Large Vocabulary ASR is not addressed.
ReplyDelete
Replies