Thursday, August 5, 2010

[ASR] A Study on Lattice Rescoring with Knowledge Scores for Automatic Speech Recognition

Download now or preview on posterous
isca_rescoring.pdf (181 KB)

15 HMM based articulatory detectors are adopted to generate log-likelihood rate features for a later stage NN to predict phone posteriors.

Meanwhile, those LLRs are directly used to rescoring the lattice generated from the standard HMM ASR systems and has been shown to yield better performance.

The articulatory knowledge scores are generated by those HMM based detectors, which are better than NN based detectors.
The problem with the NN-based scores is that they are likely to fluctuate.

Automatic Speech Attribute Transcription (ASAT) paradigm.

Frame level LLRs are better than segmental level's.

The 15 articulators adopted in this paper are: fricative, vowel, stop, nasal, approximant, low, mid, high, labial, coronal, dental, velar, retroflex, glottal, and silence.

Posted via email from Troy's posterous

[LVCSR] A phonetic feature based lattice rescoring approach to LVCSR

In the previous work, the authors developed a detector based high performance phone recognizer. Articulatory informations are extracted using a bunch of speech feature detectors implemented by MLP. A final event merger, another MLP, combines those different detectors to predict phoneme posteriors, which are used as HMM's emission probabilities for decoding.

In this paper, the detector based phoneme recognizer is extended to LVCSR. With the state-of-the-art HMM based speech recognizer, word lattices are generated. The the high quality monophone posteriors generated by the detector based recognizer is utilized to rescore the lattices for second stage decoding.

Comparing with standard MLE and MMI trained HMM systems, the rescored lattices yield lower WER on WSJ0 corpus.

The system structure is illustrated in the figure below:


Download now or preview on posterous
4960471.pdf (286 KB)

Posted via email from Troy's posterous

[NN/HMM] Introducing Phonetically Motivated, Heterogeneous Information into Automatic Speech Recognition

Phonetically motivated experts are investigated for multi-stream automatic speech recognition.

The two experts adopted are:
1) {vowel, consonant, nasal, liquid, silence}
2) {voiced, unvoiced, silence}

The basic system is a MLP to predict phone posteriors and two settings are used: one is full-band and the other is multi-band.

The fusion of the original model and the expert system is done by simply multiplying them together.

The two experimental settings are displayed below:

Posted via email from Troy's posterous

Tuesday, August 3, 2010

Map of the brain's network

The scientists focused on the long-distance network of 383 brain regions and 6,602 long-distance brain connections that travel through the brain’s white matter, which are like the “interstate highways” between far-flung brain regions, he explained, while short-distance gray matter connections (based on neurons) constitute “local roads” within a brain region and its sub-structures

Posted via email from Troy's posterous

Monday, August 2, 2010

Split lossless audio (ape, flac, wv, wav) by cue file in Ubuntu

From: http://aidanjm.wordpress.com/2007/02/15/split-lossless-audio-ape-flac-wv-wav-by-cue-file/

Lossless audio files can be split by cue file using “shnsplit” (part of the “shntool” package). You will also need the “cuebreakpoints” tool (part of the “cuetools” package). To install cuetools and shntool in Ubuntu/ Kubuntu, open a terminal window and enter the following:

sudo apt-get install cuetools shntool

You will also need software for your prefered lossless audio format. For Monkey’s Audio you need to install “mac” – see here for details. For FLAC and WavPack formats you need to install “flac” and “wavpack” respectively:

sudo apt-get install flac wavpack

Shnsplit requires a list of break-points with which to split an audio file. Conveniently, cuebreakpoints prints the break-points from a cue or toc file in a format that can be used by shnsplit. You can pipe the output of cuebreakpoints to shnsplit as follows:

cuebreakpoints sample.cue | shnsplit -o flac sample.flac

In this example, a flac file called “sample.flac” is split according to the break-points contained in “sample.cue” and the results are output in the flac format.

The output file format is specified via the “-o” option. If you don’t specify an output format your split files will be in shntool’s default format (i.e., wave files, “wav”).

To split a monkey’s audio file by cue file and output the results in the flac format:

cuebreakpoints sample.cue | shnsplit -o flac sample.ape

Note that a default prefix “split-track” is used to name the output files. (The default output format is split-track01, split-track02, split-track03, …). You can specify your own prefix via the “-a” option.

To see all the options for shntool split type “shntool split -h” or “shnsplit -h”.

Transferring tags

The audio files output by shnsplit do not contain tag data. However you can use the “cuetag” script (installed as part of the cuetools package) to transfer tag data directly from a cue file to your split audio files. You specify the individual audio files corresponding to the tracks contained in your cue file as follows:

cuetag sample.cue split-track01.flac split-track02.flac split-track03.flac split-track04.flac

This will transfer the tag data contained in “sample.cue” to the flac audio tracks “split-track01.flac” “split-track02.flac” “split-track03.flac” and “split-track04.flac”.

The above command could be streamlined as:

cuetag sample.cue split-track*.flac

Cuetag works with flac, ogg and mp3 files. The cuetag script is not currently able to handle file names containing spaces.

Note: If you are running flac version 1.1.4 or higher then you may need to make some small changes to the cuetag script before it will work correctly with flac files. Open the cuetag script (for Ubuntu installations it will be located at /usr/bin/cuetag) in a text editor and make these two changes: 1) search for the text “remove-vc-all” and replace it with “remove-all-tags”. 2) search for the “import-vc-from” and replace with “import-tags-from”.

Posted via email from Troy's posterous

Sunday, August 1, 2010

From movies

Life is complicated and we do our best!

Instead of finding the one, find the one you love and make him/her the perfect one of yours.

Posted via email from Troy's posterous

Google+