Monday, July 12, 2010

[Speaker Adaptation][2009] Estimating speaker characteristics for speech recognition

A speaker-characteristic-based hierarchic tree of speech recognition model is proposed (as shown in the figure below).

Two kinds of features are adopted for the tree node splitting:
1) 1D vocal tract length warping factor
2) 4D vector: vocal tract length, two spectral slop parameters and a model variance scaling

On each internal nodes, the dimension values are an interval, and only in the leaf nodes, each node corresponds to a unique speaker profile vector.

The speaker profile vector is used to adapt the original trained model to estimate a profile-specific transformation. (However, how the transformation is estimated is not clearly explained.)

Recognition is done by firstly using the test speaker's profile down through the tree to find the the best model (if I understand correctly).

Experiments are carried on recognizing the children's connected digits speech using the originally trained on adults' speech data.

Knowledge on speech production can play an important role in speech recognition by imposing constrains on the structure of trained and adapted models. 

Download now or preview on posterous (163 KB)

Posted via email from Troy's posterous

No comments:

Post a Comment