15 HMM based articulatory detectors are adopted to generate log-likelihood rate features for a later stage NN to predict phone posteriors.
Meanwhile, those LLRs are directly used to rescoring the lattice generated from the standard HMM ASR systems and has been shown to yield better performance.
The articulatory knowledge scores are generated by those HMM based detectors, which are better than NN based detectors.
The problem with the NN-based scores is that they are likely to fluctuate.
Automatic Speech Attribute Transcription (ASAT) paradigm.
Frame level LLRs are better than segmental level's.
The 15 articulators adopted in this paper are: fricative, vowel, stop, nasal, approximant, low, mid, high, labial, coronal, dental, velar, retroflex, glottal, and silence.
No comments:
Post a Comment