Wednesday, September 22, 2010
Tuesday, September 21, 2010
New Start - Picper
Just have a good idea of illustrating information - Picper.
That is to illustrate the information of research in a pictorial way.
From now on, I will try my best to illustrated the research covered in a paper with only one piece of information.
To indicating the start, the blog site will soon be updated soon.
I will explore my dream and passion in this new way of information sharing, which is more belongs to the field of inforgraphics.
I love graphics, and believe that a graph is more than thousands of words.
Maybe it seems like back to the pre-history age, without written language. However, if we could express ourselves, transit information without written languages, why not do that.
Although I'm not an artist, nor had I learnt how to draw except for the art classes in primary schools, I like visual things and I believe it.
Let's just try to make it useful and make it big!!!
What's more, never forget one thing - "Dream & Passion"!
Monday, September 20, 2010
[Speech] MFCC feature processing
The MFCC feature extraction process is briefly illustrated in the following figure:
1) The time sequence signal is first split into overlapping windows, then Hamming weights are applied on each window;
2) Short Time Fourier Transforms are applied for each window to get the spectrum;
3) Mel-filters are applied to those spectrum;
4) With Cepstral Analysis, the Cepstral coefficients are computed, which is the Mel Frequency Cepstral Coefficients (MFCC).
Thursday, September 16, 2010
[Speech] Spectrogram of a sentence in TIMIT generated by VoiceBox
VoiceBox is a bunch of Matlab files for speech processing.
I have just found it is really a good set of tools for speech analysis.
Two spectrograms are generated using VoiceBox.
Tuesday, September 14, 2010
[Misc] Change the ruler units from inches to centimeters
By default, the ruler in PowerPoint displays measurements in inches. If you want to view and work with centimeters, you must configure Microsoft Windows. You cannot change the measurement units directly in PowerPoint.
- In Windows click Start, and then click Control Panel.
- Double-click Regional Options or Regional and Language Options.
The options that appear are different for each version of Windows:
- If you are working in Windows XP, click the Regional Options tab, and then click Customize. On the Numbers tab, in the Measurement system list, click Metric.
- If you are not using Windows XP, look for a Numbers option that includes a Measurements setting, and then change the value toMetric.
- Apply the changes, and then start PowerPoint.
The rulers will now display measurements in the system that you have chosen.
TIP To change back from centimeters to inches, select U.S. from theMeasurement System drop-down list.
[Paper] Analysis of MLP based hierarchical phoneme posterior probability estimator
1. Two MLPs in a Tandem fashion for phone recognition. The first MLP is used to nonlinearly convert the acoustic features into posterior features.
Acoustic features are known to exhibit a high degree of nonlinguistic variabilities such as speaker and environmental (e.g. noise, channel) characteristics. The first MLP classifier can be interpreted as a discriminatively trained nonlinear transformation from the acoustic feature space to the posterior feature space. It has been shown that a well trained (large population of speakers, and different conditions) MLP classifier can achieve invariance to speaker as well as environment characteristics. Moreover, it has also been shown that the effect of co-articulation is less severe on the posterior features when compared to the acoustic features.
2. The behavior of the second MLP is analyzed using Volterra series.
3. Benefits of MLP based acoustic modeling:
a) It obviates the need for strong assumptions on the statistics of the features and the parametric form of its density function, easy for feature combination;
b) MLPs have been shown to be invariant to speaker characteristics and environment specific information such as noise, when trained on large amount of data;
c) Output of the MLP are probabilities with useful properties;
d) MLP can be trained efficiently and is scalable with large amount of data.
4. Posterior features have less nonlinguistic variabilities and sparse representation, and linear separable.
5. MLP acoustic modeling could be improved in following ways:
a) using richer acoustic information;
b) increasing the capacity of the MLP, however, this approach is often limited by the amount of training data;
c) using finer representation of output classes such as sub-phoneme state;
6. Normalization of posterior features obviate the effect of unigram phonetic class priors learned by the first MLP. The priors are, however, again learned by the second MLP classifier.
7. The MLPs are trained using Quicknet package. The phoneme n-gram models are trained using the SRILM toolkit and phoneme recognition is performed using the weighted finite state transducer based Juicer decoder.
8. A potential application of MLP based hierarchical system is in task adaptation. At the first stage of the hierarchical system, a well trained MLP available off-the-shelf could be used. The second MLP is trained on the posterior features estimated for the target task (adaptation data). It has already been observed that the second MLP in the hierarchy requires fewer number of parameters and can be trained using lesser amount of data.
[Speech] Bionic Speech Recognition
Bionic Speech Recognition
PhysOrg.com (09/09/10) A new speech enhancement system developed at the University Campus' Laboratory of Signal Processing in Tunis, Tunisia, could help ensure that voice signals are as clear as possible before they are processed by a computer and acted upon. The researchers used a bionic wavelet transform and a recurrent neural network to reduce the noise from a recorded or sampled voice signal. The approach is designed to address additive or white noise, the random background hiss of a sound recording, which can have the most impact on speech recognition. Tests against several types of noises and a noisy speech database showed an increase in the signal to noise ratio from 5 dB to 12 dB. The researchers say that voice signals need to be clear for speech recognition systems because they could impact the profitability of a financial deal, the safety of a vehicle, or the maneuverability of aircraft. They say their approach also could be used for mobile phone conversations or secret recordings of speech for security and law enforcement purposes.
http://www.physorg.com/news203258431.html
PhysOrg.com (09/09/10) A new speech enhancement system developed at the University Campus' Laboratory of Signal Processing in Tunis, Tunisia, could help ensure that voice signals are as clear as possible before they are processed by a computer and acted upon. The researchers used a bionic wavelet transform and a recurrent neural network to reduce the noise from a recorded or sampled voice signal. The approach is designed to address additive or white noise, the random background hiss of a sound recording, which can have the most impact on speech recognition. Tests against several types of noises and a noisy speech database showed an increase in the signal to noise ratio from 5 dB to 12 dB. The researchers say that voice signals need to be clear for speech recognition systems because they could impact the profitability of a financial deal, the safety of a vehicle, or the maneuverability of aircraft. They say their approach also could be used for mobile phone conversations or secret recordings of speech for security and law enforcement purposes.
http://www.physorg.com/news203258431.html
Monday, September 13, 2010
[Linux] Convert binary HTK format HMM model file (MMF) to ASCII format
While training HMM models using HTK, we could decide whether the gained model is stored in binary format or not using the option "-B".
How about if we get a binary model from others and we want to manipulate it?
If the operations to carry on are supported by HTK's tool HHED, then just write a correct editing script for HHED and use it.
What if not supported?
The easiest way it to convert it to ASCII format and do whatever you like.
To convert the binary MMF model to ASCII format, we still need the HHED tool.
Simply create an empty file as the editing script and then use HHED to load the binary model, do editing (nothing would be done), and save the mode out in ASCII format!
The command is simple:
HHED -H <the original binary MMF file> -M <output folder for the ASCII model> <the empty editing script file> <the model list>
Saturday, September 11, 2010
The first 5 lessons from Yale Open Course - Game Theory
Lesson 1:
You should never play a strictly dominated strategy.
Lesson 2:
Rational play by rational players can lead to bad outcomes.
Lesson 3:
To figure out what actions you should choose in a game, a good first step is to figure out what are you payoffs ( what do you care about) and what other players' payoffs.
Lesson 4:
If you do not have a dominate strategy, put yourself in your opponents' shoes to try to predict what they will do. For example, in their shoes, you would not choose a dominated strategy.
Lesson 5:
Yale students are evil.
Thursday, September 2, 2010
Turing Lecture: Embracing Uncertainty. Chris Bishop
Also the video could be found:
Probabilistic Graphic modeling
Subscribe to:
Posts (Atom)