A speaker-characteristic-based hierarchic tree of speech recognition model is proposed (as shown in the figure below).
Two kinds of features are adopted for the tree node splitting:
1) 1D vocal tract length warping factor
2) 4D vector: vocal tract length, two spectral slop parameters and a model variance scaling
On each internal nodes, the dimension values are an interval, and only in the leaf nodes, each node corresponds to a unique speaker profile vector.
The speaker profile vector is used to adapt the original trained model to estimate a profile-specific transformation. (However, how the transformation is estimated is not clearly explained.)
Recognition is done by firstly using the test speaker's profile down through the tree to find the the best model (if I understand correctly).
Experiments are carried on recognizing the children's connected digits speech using the originally trained on adults' speech data.
Knowledge on speech production can play an important role in speech recognition by imposing constrains on the structure of trained and adapted models.
No comments:
Post a Comment