Step I: Convert the Recognition Results generated by HTK to ScLite format, such as trn format.

In trn format, each line is an utterance with the utterance id in parentheses after it.

e.g:

I LIKE ICE CREAM (c02abd30)

Step I: Generate Alignment using sclite

e.g:

sclite -F -i wsj -r ref.trn -h w15_bg.trn -o sgml

-F scores segments as correct instead of "-d" which uses "diff" for differences;

-i sets the utterance id type to be wsj (Wasll Street Journey)

-r sets the reference transcription to be file "ref.trn"

-h sets the recognized results file to be "w15_bg.trn"

-o sets the output result format to be "sgml"

The output is a file named "w15_bg.trn.sgml".

Note: for each experimental result file, we have to align it to the reference to generate a "sgml" format file.

Step III: Significance Test using sc_stats

e.g:

cat w15_bg.trn.sgml w15_tg.trn.sgml | sc_stats -p -t mapsswe -v -u -n result_bg_tg

-p reads from stdin, the piped output;

-t specifies the test to be mapsswe (the Matched Pairs Sentence Segment Word Error Test)

-v performs the tests on a pair of hypothesis files

-u unifies the test instead of creating comparison matrix for each test

-n output report file name

Appendix:

sc_stats options:

Sc_stats Commandline Options

The commandline options for sc_stats can be broken into four categories:

Input File Options:
- Output Options:
  - Report Generation Options:
    - Statistical Test Options:

Input File Options:

-p

Alignments are read from 'stdin' as input to sc_stats. The format of the input must be in the "sgml" output format, created either by '-o sgml' or by piped input from another sctk utility.

Output Options:

-e desc

Description of the ensemble of hyp files.

-O output_dir

Writes all output files into output_dir. Defaults to the hypfile's directory

-n name

Writes all multiple hypothesis file reports to files beginning with 'name'. Using '-' writes to stdout. Default: 'Ensemble'

Report Generation Options:

-g

- The 'range' graphs are an ASCII representation of the of the variablity in error rates for a given speaker. The graph is sorted be the mean of statistic computed for each speaker. EXAMPLE

- The 'grange' graph is a gnuplot version of the same data ploted in 'range. There are two sets of files created. The first set, which is called '*.grange.spk.plt' and '*.grange.spk.dat', contains the gnuplot command files and data files respectively for the speaker performance variability across systems graph. The second set, which is called '*.grange.sys.plt' and '*.grange.sys.dat', contains the gnuplot command files and data files respectively for the system performance variability across speakers graph. EXAMPLE

- The 'grange2' graph is similar to the 'grange' graph except that each systems speaker word error scores are identified by a unique symbol. EXAMPLE

-r

prn -: Example
sum -: Example
rsum -: Example
lur -: Example
es -: Example
res -: Example
none -: Produce no output reports, Default.

Statistical Test Options:

-t

mcn -: Perform the McNemar Test.
mapsswe -: Perform the Matched Pairs Sentence Segment Word Error Test
sign -: Perform the Sign Test
wilc -: Perform the Wilcoxon Signed Rank Test
anovar -: Perform the Analysis of Variance by Rank Test
std -: This is a shorthand notation to do the 'standard' four tests: mcn, mapsswe, wilc and sign.

-v

For each test performed on a pair of systems files, output a detailed analysis.

-u

Rather than creating a comparison matrix for each test, unify statistical test results into a single comparision matrix

-f