Tuesday, August 9, 2011

HTK lattice format

The lattices generated by HVite have the following general form
VERSION=1.0
UTTERANCE=testf1.mfc
lmname=wdnet
lmscale=20.00 wdpenalty=-30.00
vocab=dict
N=31
L=56
I=0
t=0.00
I=1
t=0.36
I=2
t=0.75
I=3
t=0.81
... etc
I=30
t=2.48
J=0
S=0
E=1
W=SILENCE
J=1
S=1
E=2
W=FOUR
... etc
J=55
S=29
E=30
W=SILENCE
v=0 a=-3239.01 l=0.00
v=0 a=-3820.77 l=0.00
v=0 a=-246.99 l=-1.20

The first 5 lines comprise a header which records names of the files used to generated the lattice along with the settings of the language model scale and penalty factors. 

Each node in the lattice represents a point in time measured in seconds and each arc represents a word spanning the segment of the input starting at the time of  its start node and ending at the time of its end node. For each such span, v gives the number of pronunciation used, a gives the acoustic score and l gives the language model score.

The language model scores in output lattices do not include the scale factors and penalties. There are removed so that the lattice can be used as a constrained network for subsequent recognizer testing. 

When using HVite normally, the word level network file is specified using the -w option. When the -w option is included but no file name is included, HVite constructs the name of a lattice file from the name of the test file and inputs that. Hence, a new recognition network is created for each input file and recognition is very fast. This is an efficient way, for example, of experimentally determining optimum values for the language model scale and penalty factors.

 

Posted via email from Troy's posterous

No comments:

Post a Comment

Google+