Wednesday, March 17, 2010

Using Sclite

Step I: Convert the Recognition Results generated by HTK to ScLite format, such as trn format.

In trn format, each line is an utterance with the utterance id in parentheses after it.

e.g:

 I LIKE ICE CREAM (c02abd30)


Step I: Generate Alignment using sclite

e.g:

sclite -F -i wsj -r ref.trn -h w15_bg.trn -o sgml

-F scores segments as correct instead of "-d" which uses "diff" for differences;

-i sets the utterance id type to be wsj (Wasll Street Journey)

-r sets the reference transcription to be file "ref.trn"

-h sets the recognized results file to be "w15_bg.trn"

-o sets the output result format to be "sgml"

The output is a file named "w15_bg.trn.sgml".

Note: for each experimental result file, we have to align it to the reference to generate a "sgml" format file.

Step III: Significance Test using sc_stats

e.g:

cat w15_bg.trn.sgml w15_tg.trn.sgml | sc_stats -p -t mapsswe -v -u -n result_bg_tg

-p reads from stdin, the piped output;

-t specifies the test to be mapsswe (the Matched Pairs Sentence Segment Word Error Test)

-v performs the tests on a pair of hypothesis files

-u unifies the test instead of creating comparison matrix for each test

-n output report file name


Appendix:

sc_stats options:

Sc_stats Commandline Options

The commandline options for sc_stats can be broken into four categories:

  1. Input File Options:

Input File Options:

    These options control/define the input to sc_stats. Input must come from stdin and the -p option must be used. (Forcing the user to use the -p option enables future expandability while maintaining backward compatability.)

    -p

      Alignments are read from 'stdin' as input to sc_stats. The format of the input must be in the "sgml" output format, created either by '-o sgml' or by piped input from another sctk utility.
Output Options:
    -e desc
      Description of the ensemble of hyp files.
    -O output_dir
      Writes all output files into output_dir. Defaults to the hypfile's directory
    -n name
      Writes all multiple hypothesis file reports to files beginning with 'name'. Using '-' writes to stdout. Default: 'Ensemble'
Report Generation Options:
    -g [ range | grange | grange2 ]
      Generate per speaker range graphs, based on the formula defined by '-f'. The reports are written to files whose root name begins with the values defined by '-n'. There are two graphs produced, one showing speaker performance variability across systems and the second showing system performance variablity for across speakers.

      - The 'range' graphs are an ASCII representation of the of the variablity in error rates for a given speaker. The graph is sorted be the mean of statistic computed for each speaker. EXAMPLE

      - The 'grange' graph is a gnuplot version of the same data ploted in 'range. There are two sets of files created. The first set, which is called '*.grange.spk.plt' and '*.grange.spk.dat', contains the gnuplot command files and data files respectively for the speaker performance variability across systems graph. The second set, which is called '*.grange.sys.plt' and '*.grange.sys.dat', contains the gnuplot command files and data files respectively for the system performance variability across speakers graph. EXAMPLE

      - The 'grange2' graph is similar to the 'grange' graph except that each systems speaker word error scores are identified by a unique symbol. EXAMPLE


    -r [ sum | rsum | lur | es | res | none ]
Statistical Test Options:
    -t [ mcn | mapsswe | sign | wilc | anovar | std4 ]
      mcn -
      Perform the McNemar Test.
      mapsswe -
      Perform the Matched Pairs Sentence Segment Word Error Test
      sign -
      Perform the Sign Test
      wilc -
      Perform the Wilcoxon Signed Rank Test
      anovar -
      Perform the Analysis of Variance by Rank Test
      std -
      This is a shorthand notation to do the 'standard' four tests: mcn, mapsswe, wilc and sign.

    -v
      For each test performed on a pair of systems files, output a detailed analysis.

    -u
      Rather than creating a comparison matrix for each test, unify statistical test results into a single comparision matrix

    -f [ E | R | W ]
      Use the identified formula for statistical tests: sign, wilcoxon and anovar tests. The formulas are:
      1. E -> Percentage Word Error
      2. R -> Percentage Words Correctly Recognized
      3. E -> Percentage Word Accuracy
      By default 'E'

Posted via email from Troy's posterous

No comments:

Post a Comment

Google+