Here is just a way to solve the problem. It requires two pass of alignments. If there are better ways to do so, let me know!
1) Do word level alignment using sphinx3_align as in the previous post.
sphinx3_align \
-logbase 1.0001 \
-feat 1s_12c_12d_3p_12dd \
-mdef model/model_architecture/wsj_all_cont_3no_8000.mdef \
-senmgau .cont. \
-hmm model/model_parameters/wsj_all_cont_3no_8000_32 \
-beam 1e-150 \
-dict ../../lib/dicts/wsj_all.dic \
-fdict ../../lib/dicts/wsj_all.filler \
-ctl ../../lib/flists/wav_fids.scp \
-cepdir ../../data/feat \
-cepext .mfc \
-insent ../../lib/wlabs/441c0200.lsn \
-outsent alignments/output.txt \
-wdsegdir wdseg,CTL \
-phlabdir phnlab,CTL \
-agc none \
-cmn current
2) convert the phoneme labels to a phoneme transcriptions file
3) Do phoneme alignment with sphinx3_align
sphinx3_align \
-logbase 1.0001 \
-feat 1s_12c_12d_3p_12dd \
-mdef model/model_architecture/wsj_all_cont_3no_8000.mdef \
-senmgau .cont. \
-hmm model/model_parameters/wsj_all_cont_3no_8000_32 \
-beam 1e-150 \
-insert_sil 0 \
-dict ../../lib/dicts/phone.dic \
-ctl ../../lib/flists/wav_fids.scp \
-cepdir ../../data/feat \
-cepext .mfc \
-insent trans_phn.txt \
-outsent alignments/output_phn.txt \
-wdsegdir phnseg,CTL \
-agc none \
-cmn current
Hi, I managed to perform a one pass phone alignment using -phsegdir instead of -phlabdir (or adding it) in the first pass of your tutorial.
ReplyDeleteThe output in the segmentation dir is a file with the same name as the original one but a .phseg extension.
It looks like this:
SFrm EFrm SegAScr Phone
0 2 -50469 SIL
3 9 -55412 W SIL IY b
10 15 -44970 IY W W e
...
The first phone is the one you're aligning, the second and third are the previous and posterior phone, respectively, and the last letter indicates if the phone is at the beginning (b), end (e) or into (i) a word.
I didn't solve it actually. Alejandro, a researcher of my group, passed me the tip.
~Miguel
Does the cepstrum, as well as feat (not sure what feat means), need to computed in advance?
ReplyDelete