Thursday, July 29, 2010

[HTK] HVite output information

hvite.pdf (71 KB)

Using HVite for both recognition and alignment, we could set the trace level to be 1, then we could get information of following format in the terminal:

File: /home/li-bo/research/databases/WSJ/mfcc_0_d_a_z/c2l/c2la0102.mfcc
c2l  ==  [482 frames] -72.9054 [Ac=-35140.4 LM=0.0] (Act=72.4)

The first line show which file is processed.
The second line, firstly outputs the recognized result sequence end with "=="; after that, the important model likelihood information are given:

[482 frames] : total number of frames in the utterance;
-72.9054 : overall average log likelihood per frame for the sentence, which equals to ( acoustic log likelihood + language model log likelilhood ) / totoal frames;
[Ac=-35140.4  : the total acoustic log likelihood for the whole utterance;
LM=0.0]   :  the total language model log likelihood for the whole utterance;
(Act=72.4)  :  the average number of active models.

Similarly, in the recognized MLF file or aligned MLF file, there are also scores:

0 33900000 c02 -79.211006

Between the filename and the ".", the recognized or aligned results are written line by line.
Each line in the example has 4 fields, the number of fields varies due to the setting of HVite, e.g with "-f"  we could keep track of the model's state information.

In this example, 
the first field is the start time of that segment;
the second field is the ending time of that segment;
the third field is the recognized symbol;
the last field is the total log likelihood of that segment.

In the attached file, it also gives another example.

