Step I: Generate Alignment using sclitee.g:sclite -F -i wsj -r ref.trn -h w15_bg.trn -o sgml-F scores segments as correct instead of "-d" which uses "diff" for differences; -i sets the utterance id type to be wsj (Wasll Street Journey)-r sets the reference transcription to be file "ref.trn"-h sets the recognized results file to be "w15_bg.trn"-o sets the output result format to be "sgml" The output is a file named "w15_bg.trn.sgml".Note: for each experimental result file, we have to align it to the reference to generate a "sgml" format file.
Step III: Significance Test using sc_stats e.g:cat w15_bg.trn.sgml w15_tg.trn.sgml | sc_stats -p -t mapsswe -v -u -n result_bg_tg-p reads from stdin, the piped output;-t specifies the test to be mapsswe (the Matched Pairs Sentence Segment Word Error Test) -v performs the tests on a pair of hypothesis files-u unifies the test instead of creating comparison matrix for each test-n output report file name
Sc_stats Commandline Options
The commandline options for sc_stats can be broken into four categories:
- Input File Options:
- Output Options:
- Report Generation Options:
- Statistical Test Options:
- These options control/define the input to sc_stats. Input must come from stdin and the -p option must be used. (Forcing the user to use the -p option enables future expandability while maintaining backward compatability.) -p
- Alignments are read from 'stdin' as input to sc_stats. The format of the input must be in the "sgml" output format, created either by '-o sgml' or by piped input from another sctk utility.
- -e desc
- Description of the ensemble of hyp files.
- Writes all output files into output_dir. Defaults to the hypfile's directory
- Writes all multiple hypothesis file reports to files beginning with 'name'. Using '-' writes to stdout. Default: 'Ensemble'
- -g [ range | grange | grange2 ]
- Generate per speaker range graphs, based on the formula defined by '-f'. The reports are written to files whose root name begins with the values defined by '-n'. There are two graphs produced, one showing speaker performance variability across systems and the second showing system performance variablity for across speakers.
- The 'range' graphs are an ASCII representation of the of the variablity in error rates for a given speaker. The graph is sorted be the mean of statistic computed for each speaker. EXAMPLE
- The 'grange' graph is a gnuplot version of the same data ploted in 'range. There are two sets of files created. The first set, which is called '*.grange.spk.plt' and '*.grange.spk.dat', contains the gnuplot command files and data files respectively for the speaker performance variability across systems graph. The second set, which is called '*.grange.sys.plt' and '*.grange.sys.dat', contains the gnuplot command files and data files respectively for the system performance variability across speakers graph. EXAMPLE
- The 'grange2' graph is similar to the 'grange' graph except that each systems speaker word error scores are identified by a unique symbol. EXAMPLE
-r [ sum | rsum | lur | es | res | none ]
- -t [ mcn | mapsswe | sign | wilc | anovar | std4 ]
- mcn -
- Perform the McNemar Test.
- mapsswe -
- Perform the Matched Pairs Sentence Segment Word Error Test
- sign -
- Perform the Sign Test
- wilc -
- Perform the Wilcoxon Signed Rank Test
- anovar -
- Perform the Analysis of Variance by Rank Test
- std -
- This is a shorthand notation to do the 'standard' four tests: mcn, mapsswe, wilc and sign.
- For each test performed on a pair of systems files, output a detailed analysis.
- Rather than creating a comparison matrix for each test, unify statistical test results into a single comparision matrix
-f [ E | R | W ]
- Use the identified formula for statistical tests: sign, wilcoxon and anovar tests. The formulas are:
- E -> Percentage Word Error
- R -> Percentage Words Correctly Recognized
- E -> Percentage Word Accuracy