Speaker adaptation using generalised low rank approximations of training matrices
A speaker adaptation method based on the low rank approximation of matrices (GLRAM) of training models is described. In the method, each model is represented as a matrix, and a set of such training matrices is decomposed into a set of speaker weights and two basis matrices for row and column spaces by reducing both row and column ranks of thetraining models. As a result, the speaker weight becomes amatrix, the row and column dimensions of which can beadjusted. In the isolated-word experiment, the proposed method showed better performance than both eigenvoice and MLLR for the adaptation data of about 20 s or longer.
Bilinear model for speaker adaptation using tensor analysis
A novel speaker adaptation method based on two-way analysis of training speakers is described. A set of training models is expressed as a tensor and is decomposed into two factors using nonlinear iterative partial least squares, producing a bilinear model. The resulting model has bases of lower dimension and more free parameters than those of eigenvoice, enabling more elaborate modelling for a moderate amount of adaptation data. Results from the isolated-word recognition test show that the proposed model outperforms both eigenvoice and maximum likelihood linear regression (MLLR) for adaptation data longer than 15 s. Moreover, the proposed method can straightforwardly be extended to n-way analysis, e.g. for simultaneous adaptation of speaker, environment, etc.