Dream & Passion: testing cmusphinx3 alignment

Task: do alignment with the cmusphinx3 tools.

1) Data preparation:

Using one of the WSJ si_et_05 testing speaker' prompts, 42 wav files are recorded as the testing data. The wav is saved in the standard PCM encoding, i.e. the basic MS wav format. And put only the file names of those (without path and suffix) to a list file "wav_fids.scp", which is called control file in cmusphinx community.

Convert the prompts to the format of:

FIRST COMMODITY APPEALED THE EXPULSION AND FINE TO THE C. F. T. C. (441C0201)

The last item in the "(" and ")" is the file name of the corresponding recording. This will serve as the transcription file to be used for alignment.

To extract the cepstral features with sphinx_fe command ( which is located in sphinxbase/src/sphinx_fe ):

sphinx_fe -verbose yes -c wav_fids.scp -mswav yes -di "../wav" -ei "wav" -do "../feat" -eo "mfc"

With this command, most of the feature extraction parameters are using the default values. According to the specific requirements, adjust the parameter values. After this command, there will be a ".mfc" file under the folder "../feat" corresponding to each ".wav" file in the folder ../wav.

To view the content of the ".mfc" feature file, use the command sphinx_cepview (which is also located in sphinxbase/src):

sphinx_cepview -header 1 -describe 1 -d 13 -f ../feat/441c0216.mfc

2) Prepare the dictionary

As most of the example scripts come with cmusphinx are using cmudict.0.6d, here we will also use this version instead of the newest cmudict.0.7a.

First, download the dictionary from https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/ and remove the comments at the beginning and the stress symbols in the dictionary to give the dictionary "wsj_all.dic" for alignment.

Meanwhile, generate the phone list file "wsj_all.phones" which contains the total 39 phones from the dictionary file and an extra "SIL".

Also create the filler dictionary "wsj_all.filler" with the contents of:

<s> SIL

</s> SIL

<sil> SIL

3) Prepare the model

In this experiment, we use the existing WSJ acoustic model trained by Keith Vertanen (http://www.keithv.com/software/sphinx/us/sphinx_wsj_all_cont_3no_8000_32.zip). Simply download and extract the folders.

For alignment, no Language model is required.

4) Do the alignment

The alignment is done with the sphinx3_align ( from sphinx3/src/programs) with following configurations:

sphinx3_align \

-logbase 1.0001 \

-feat 1s_12c_12d_3p_12dd \

-mdef model/model_architecture/wsj_all_cont_3no_8000.mdef \

-senmgau .cont. \

-mean model/model_parameters/wsj_all_cont_3no_8000_32/means \

-var model/model_parameters/wsj_all_cont_3no_8000_32/variances \

-mixw model/model_parameters/wsj_all_cont_3no_8000_32/mixture_weights \

-tmat model/model_parameters/wsj_all_cont_3no_8000_32/transition_matrices \

-beam 1e-80 \

-dict wsj_all.dic \

-fdict wsj_all.filler \

-ctl wav_fids.scp \

-cepdir ../feat \

-cepext .mfc \

-insent transcription.txt \

-outsent alignments/output.txt \

-wdsegdir segmentations,CTL \

-agc none \

-cmn current

Make sure there are "alignments" and "segmentations" two folders under the current path.

5) The results

In the "output.txt" file under the "alignments" folder, each line is of the form of:

<s> <sil> FIRST <sil> COMMODITY APPEALED <sil> THE(2) <sil> EXPULSION AND <sil> FINE TO(2) THE C. F. T. C. </s> (441c0201)

representing the alignment and the chosen pronunciation of each word in the dictionary.

Under the "segmentations" folder, there are ".wdseg" files for each ".mfc" file. For example, the content of "441c0201.wdseg" is:

SFrm EFrm SegAScr Word

0 16 -1547495 <s>

17 50 -187467 <sil>

51 81 -1144739 FIRST

82 96 -589279 <sil>

97 161 -3079573 COMMODITY

162 217 -2497321 APPEALED

218 231 -878705 <sil>

232 254 -993106 THE(2)

255 257 -268085 <sil>

258 314 -6117691 EXPULSION

315 371 -2952645 AND

372 376 -269014 <sil>

377 409 -1439215 FINE

410 424 -618393 TO(2)

425 449 -2016365 THE

450 481 -1288960 C.

482 504 -1931603 F.

505 532 -1083254 T.

533 576 -1412650 C.

577 676 -488706 </s>

Total score: -30804266

Useful References:

http://www.keithv.com/software/sphinx/

http://www.speech.cs.cmu.edu/sphinx/tutorial.html

http://cmusphinx.sourceforge.net/wiki/sphinx4:sphinxthreealigner

Tuesday, May 15, 2012

testing cmusphinx3 alignment

No comments:

Post a Comment