Wednesday, March 17, 2010

Using Sclite

Step I: Convert the Recognition Results generated by HTK to ScLite format, such as trn format.

In trn format, each line is an utterance with the utterance id in parentheses after it.


 I LIKE ICE CREAM (c02abd30)

Step I: Generate Alignment using sclite


sclite -F -i wsj -r ref.trn -h w15_bg.trn -o sgml

-F scores segments as correct instead of "-d" which uses "diff" for differences;

-i sets the utterance id type to be wsj (Wasll Street Journey)

-r sets the reference transcription to be file "ref.trn"

-h sets the recognized results file to be "w15_bg.trn"

-o sets the output result format to be "sgml"

The output is a file named "w15_bg.trn.sgml".

Note: for each experimental result file, we have to align it to the reference to generate a "sgml" format file.

Step III: Significance Test using sc_stats


cat w15_bg.trn.sgml w15_tg.trn.sgml | sc_stats -p -t mapsswe -v -u -n result_bg_tg

-p reads from stdin, the piped output;

-t specifies the test to be mapsswe (the Matched Pairs Sentence Segment Word Error Test)

-v performs the tests on a pair of hypothesis files

-u unifies the test instead of creating comparison matrix for each test

-n output report file name


sc_stats options:

Sc_stats Commandline Options

The commandline options for sc_stats can be broken into four categories:

  1. Input File Options:

Input File Options:

    These options control/define the input to sc_stats. Input must come from stdin and the -p option must be used. (Forcing the user to use the -p option enables future expandability while maintaining backward compatability.)


      Alignments are read from 'stdin' as input to sc_stats. The format of the input must be in the "sgml" output format, created either by '-o sgml' or by piped input from another sctk utility.
Output Options:
    -e desc
      Description of the ensemble of hyp files.
    -O output_dir
      Writes all output files into output_dir. Defaults to the hypfile's directory
    -n name
      Writes all multiple hypothesis file reports to files beginning with 'name'. Using '-' writes to stdout. Default: 'Ensemble'
Report Generation Options:
    -g [ range | grange | grange2 ]
      Generate per speaker range graphs, based on the formula defined by '-f'. The reports are written to files whose root name begins with the values defined by '-n'. There are two graphs produced, one showing speaker performance variability across systems and the second showing system performance variablity for across speakers.

      - The 'range' graphs are an ASCII representation of the of the variablity in error rates for a given speaker. The graph is sorted be the mean of statistic computed for each speaker. EXAMPLE

      - The 'grange' graph is a gnuplot version of the same data ploted in 'range. There are two sets of files created. The first set, which is called '*.grange.spk.plt' and '*.grange.spk.dat', contains the gnuplot command files and data files respectively for the speaker performance variability across systems graph. The second set, which is called '*.grange.sys.plt' and '*.grange.sys.dat', contains the gnuplot command files and data files respectively for the system performance variability across speakers graph. EXAMPLE

      - The 'grange2' graph is similar to the 'grange' graph except that each systems speaker word error scores are identified by a unique symbol. EXAMPLE

    -r [ sum | rsum | lur | es | res | none ]
Statistical Test Options:
    -t [ mcn | mapsswe | sign | wilc | anovar | std4 ]
      mcn -
      Perform the McNemar Test.
      mapsswe -
      Perform the Matched Pairs Sentence Segment Word Error Test
      sign -
      Perform the Sign Test
      wilc -
      Perform the Wilcoxon Signed Rank Test
      anovar -
      Perform the Analysis of Variance by Rank Test
      std -
      This is a shorthand notation to do the 'standard' four tests: mcn, mapsswe, wilc and sign.

      For each test performed on a pair of systems files, output a detailed analysis.

      Rather than creating a comparison matrix for each test, unify statistical test results into a single comparision matrix

    -f [ E | R | W ]
      Use the identified formula for statistical tests: sign, wilcoxon and anovar tests. The formulas are:
      1. E -> Percentage Word Error
      2. R -> Percentage Words Correctly Recognized
      3. E -> Percentage Word Accuracy
      By default 'E'

Posted via email from Troy's posterous

No comments:

Post a Comment