Wednesday, March 31, 2010

Figure spanning 2 columns in Latex

If your using 2 columns in a latex document, you'll usually find that a table or figure is just too big for a single column. All you do is use


and that will make the figure span the width of the entire page.

Posted via email from Troy's posterous

Tuesday, March 23, 2010

Statistical significance

In statistics, a result is called statistically significant if it is unlikely to have occurred by chance.

The amount of evidence required to accept that an event is unlikely to have arisen by chance is known as the significance level or critical p-vale. In traditional Fisherian statistical hypothesis testing, the p-value is the probability conditional on the null hypothesis of the observed data or more extreme data.


If the obtained p-value is small then it can be said either the null hypothesis is false or an unusual event has occurred.

In SCTK toolkits, the Null Hypothesis is:

 There is no performance difference between the two systems.

Thus the p-value is, assuming the two system have no difference, the probability of the test statistic having a value at least as extreme as that actually found, is no more than p-value.

So, the small the p-value is, the more statistically significant the system is.

Posted via email from Troy's posterous

Speaker Adaptation Method

Download now or preview on posterous
Liang_ICASSP_2010.pdf (135 KB)

In the attached paper, the authors combine two speaker adaptation methods to improve the performance of speech synthesis of cross-lingual experiments.

The system are HMM based, and the two methods are:

1) Decision Tree Marginalization

2) HMM State Mapping


Posted via email from Troy's posterous

Investigation on Tandem Approach

In this attached paper, the authors investigated different aspects of the Tandem approach in the hybrid system.

The overall system they used:

Different processing combinations:

The best is deltas+PCA+normalize.

Posted via email from Troy's posterous

Tandem Approach for NN/HMM

Tandem approach of NN/HMM is to use NN for feature extraction. Then the posterior features are used in the conventional HMM systems.

The architecture is:

The training procedure:


Details are in the attached paper.

Download now or preview on posterous
icassp00-nnhmm.pdf (59 KB)

Posted via email from Troy's posterous

Monday, March 22, 2010

Install JDT on top of CDT

Inside the Eclipse IDE, under Help-> Install New software:

Add following Eclipse Galileo Repository:

to 'Available Software Sites'.

Then select from the Programming Language, the JDT.

Following are some repositories:

Eclipse Galileo Repository -
galileo -
update site -

Posted via email from Troy's posterous

Friday, March 19, 2010

Phoneme Recognition

In Petrov's paper, they reported their phone recognition results on TIMIT to be 21.4% PER.

List of different methods mentioned in their paper:

In Sung's paper, they used Hidden Conditional Random Field for phone recognition on TIMIT, and achieved 28.3% PER.

Download now or preview on posterous
emnlp07a.pdf (326 KB)

Download now or preview on posterous
asru09.pdf (210 KB)

Download now or preview on posterous
icassp10.pdf (66 KB)

Download now or preview on posterous
icassp97.pdf (394 KB)

Posted via email from Troy's posterous

Wednesday, March 17, 2010

Using Sclite

Step I: Convert the Recognition Results generated by HTK to ScLite format, such as trn format.

In trn format, each line is an utterance with the utterance id in parentheses after it.


 I LIKE ICE CREAM (c02abd30)

Step I: Generate Alignment using sclite


sclite -F -i wsj -r ref.trn -h w15_bg.trn -o sgml

-F scores segments as correct instead of "-d" which uses "diff" for differences;

-i sets the utterance id type to be wsj (Wasll Street Journey)

-r sets the reference transcription to be file "ref.trn"

-h sets the recognized results file to be "w15_bg.trn"

-o sets the output result format to be "sgml"

The output is a file named "w15_bg.trn.sgml".

Note: for each experimental result file, we have to align it to the reference to generate a "sgml" format file.

Step III: Significance Test using sc_stats


cat w15_bg.trn.sgml w15_tg.trn.sgml | sc_stats -p -t mapsswe -v -u -n result_bg_tg

-p reads from stdin, the piped output;

-t specifies the test to be mapsswe (the Matched Pairs Sentence Segment Word Error Test)

-v performs the tests on a pair of hypothesis files

-u unifies the test instead of creating comparison matrix for each test

-n output report file name


sc_stats options:

Sc_stats Commandline Options

The commandline options for sc_stats can be broken into four categories:

  1. Input File Options:

Input File Options:

    These options control/define the input to sc_stats. Input must come from stdin and the -p option must be used. (Forcing the user to use the -p option enables future expandability while maintaining backward compatability.)


      Alignments are read from 'stdin' as input to sc_stats. The format of the input must be in the "sgml" output format, created either by '-o sgml' or by piped input from another sctk utility.
Output Options:
    -e desc
      Description of the ensemble of hyp files.
    -O output_dir
      Writes all output files into output_dir. Defaults to the hypfile's directory
    -n name
      Writes all multiple hypothesis file reports to files beginning with 'name'. Using '-' writes to stdout. Default: 'Ensemble'
Report Generation Options:
    -g [ range | grange | grange2 ]
      Generate per speaker range graphs, based on the formula defined by '-f'. The reports are written to files whose root name begins with the values defined by '-n'. There are two graphs produced, one showing speaker performance variability across systems and the second showing system performance variablity for across speakers.

      - The 'range' graphs are an ASCII representation of the of the variablity in error rates for a given speaker. The graph is sorted be the mean of statistic computed for each speaker. EXAMPLE

      - The 'grange' graph is a gnuplot version of the same data ploted in 'range. There are two sets of files created. The first set, which is called '*.grange.spk.plt' and '*.grange.spk.dat', contains the gnuplot command files and data files respectively for the speaker performance variability across systems graph. The second set, which is called '*.grange.sys.plt' and '*.grange.sys.dat', contains the gnuplot command files and data files respectively for the system performance variability across speakers graph. EXAMPLE

      - The 'grange2' graph is similar to the 'grange' graph except that each systems speaker word error scores are identified by a unique symbol. EXAMPLE

    -r [ sum | rsum | lur | es | res | none ]
Statistical Test Options:
    -t [ mcn | mapsswe | sign | wilc | anovar | std4 ]
      mcn -
      Perform the McNemar Test.
      mapsswe -
      Perform the Matched Pairs Sentence Segment Word Error Test
      sign -
      Perform the Sign Test
      wilc -
      Perform the Wilcoxon Signed Rank Test
      anovar -
      Perform the Analysis of Variance by Rank Test
      std -
      This is a shorthand notation to do the 'standard' four tests: mcn, mapsswe, wilc and sign.

      For each test performed on a pair of systems files, output a detailed analysis.

      Rather than creating a comparison matrix for each test, unify statistical test results into a single comparision matrix

    -f [ E | R | W ]
      Use the identified formula for statistical tests: sign, wilcoxon and anovar tests. The formulas are:
      1. E -> Percentage Word Error
      2. R -> Percentage Words Correctly Recognized
      3. E -> Percentage Word Accuracy
      By default 'E'

Posted via email from Troy's posterous

Statistically Significant Test - ScLite

Wednesday, March 10, 2010


File permissions

Use the chmod command to set file permissions.

The chmod command uses a three-digit code as an argument.

The three digits of the chmod code set permissions for these groups in this order:

  1. Owner (you)
  2. Group (a group of other users that you set up)
  3. World (anyone else browsing around on the file system)

Each digit of this code sets permissions for one of these groups as follows. Read is 4. Write is 2. Execute is 1.

The sums of these numbers give combinations of these permissions:

  • 0 = no permissions whatsoever; this person cannot read, write, or execute the file
  • 1 = execute only
  • 2 = write only
  • 3 = write and execute (1+2)
  • 4 = read only
  • 5 = read and execute (4+1)
  • 6 = read and write (4+2)
  • 7 = read and write and execute (4+2+1)
Chmod commands on file apple.txt (use wildcards to include more files)
chmod 700 apple.txt Only you can read, write to, or execute apple.txt
chmod 777 apple.txt Everybody can read, write to, or execute apple.txt
chmod 744 apple.txt Only you can read, write to, or execute apple.txt Everybody can read apple.txt;
chmod 444 apple.txt You can only read apple.txt, as everyone else.

Detecting File Permissions

You can use the ls command with the -l option to show the file permissions set. For example, for apple.txt, I can do this:

$ ls -l apple.txt
-rwxr--r-- 1 december december 81 Feb 12 12:45 apple.txt

The sequence -rwxr--r-- tells the permissions set for the file apple.txt. The first - tells that apple.txt is a file. The next three letters, rwx, show that the owner has read, write, and execute permissions. Then the next three symbols, r--, show that the group permissions are read only. The final three symbols, r--, show that the world permissions are read only.