In spite of significant progress in automatic speech recognition over the years, robustness still appears to be a stumbling block. Current commercial products are quite sensitive to changes in recording device, to acoustic clutter in the form of additional speech signals, and so on. The goal of replicating human performance in a machine remains far from sight.
Tuesday, November 30, 2010
[Feature] SIFT
[Feature] Speech Recognition with localized time-frequency pattern detectors
Monday, November 29, 2010
[Speech] Phonetic cues
While automatic speech recognition systems have steadily improved and are now in widespread use, their accuracy continues to lag behind human performance, particularly in adverse conditions.
Thursday, November 25, 2010
[Speech] Spectrogram
From: http://www-3.unipv.it/cibra/edu_spectrogram_uk.html
To analyze sounds it is required to have an acoustic receiver (a microphone, an hydrophone or a vibration transducer) and an analyzer suitable for the frequencies of the signals we want to measure. Eventually, a recorder may allow to permanently store the sounds to allow later analyses or playbacks.
A spectrograph transforms sounds into images to make "visible", and thus measurable and comparable, sound features the human hear can't perceive. Spectrograms (also called sonograms or sonagrams) may show infrasounds, like those emitted by some large whales or by elephants, as well as ultrasounds, like those emitted by echolocating dolphins and by echolocating bats, but also emitted by insects and small rodents.
Spectrograms may reveal features, like fast frequency or amplitude modulations we can't hear even if they lie within our hearing frequency limits (30 Hz - 16 kHz). Spectrograms are widely used to show the features of animal voices, of the human voice and also of machinery noise.
A real-time spectrograph displays continuously the results of the analyses on the incoming sounds with a very small - often not perceivable - delay. This kind of instrumentation is very useful in field research because it allows to continuously monitor the sounds received by the sensors, to immediately evaluate their features, and to classify the received signals. A spectrograph can be dedicated instrument or a normal computer equipped with suitable hardware for receiving and digitizing sounds and a software to analyze sounds and convert them into a graphical representation.
Normally, a spectrogram represents the time on the x axis, frequency on the y axis and the amplitude of the signals by using a scale of grays or a scale of colours. In some applications, in particular those related with military uses, the x and y axes are swapped.
The quality and features of a spectrogram are controlled by a set of parameters. A default set can be used for generic display, but some parameters can be changed to optimize the display of specific features of the signals.
Also, by modifying the colour scale it is possible to optimize the display of the amplitude range of interest.
Tuesday, November 23, 2010
[News] ACMTech Nov.23
CNet (11/18/10) Marguerite ReardonAT&T says it has devised technologies to boost the accuracy of speech and language recognition technology as well as broaden voice activation to other modes of communication. AT&T's Watson technology platform is a cloud-based system of services that identifies words as well as interprets meaning and contexts to make results more accurate. AT&T recently demonstrated various technologies such as the iRemote, an application that transforms smartphones into voice-activated TV remotes that let users speak natural sentences asking to search for specific programs, actors, or genres. Most voice-activated remotes respond to prerecorded commands, but the iRemote not only recognizes words, but also employs other language precepts such as syntax and semantics to interpret and comprehend the request's meaning. AT&T also is working on voice technology that mimics natural voices through its AT&T Natural Voices technology, which builds on text-to-speech technology to enable any message to be spoken in various languages, including English, French, Italian, German, or Spanish when text is processed via the AT&T cloud-based service. The technology accesses a database of recorded sounds that, when combined by algorithms, generate spoken phrases.
http://news.cnet.com/8301-30686_3-20023189-266.html
McGill University (11/17/10)McGill University linguistics researcher Michael Wagner is studying how English and French speakers use acoustic cues to stress new information over old information. Finding evidence of a systematic difference in how the two languages use these cues could aid computer programmers in their effort to produce more realistic-sounding speech. Wagner is working with Harvard University's Katherine McCurdy to gain a better understanding of how people decide where to put emphasis. They recently published research that examined the use of identical rhymes in poetry in each language. The study found that even when repeated words differ in meaning and sound the same, the repeated information should be acoustically reduced as otherwise it will sound odd. "Voice synthesis has become quite impressive in terms of the pronunciation of individual words," Wagner says. "But when a computer 'speaks,' whole sentences still sound artificial because of the complicated way we put emphasis on parts of them, depending on context and what we want to get across." Wagner is now working on a model that better predicts where emphasis should fall in a sentence given the context of discourse.
http://www.eurekalert.org/pub_releases/2010-11/mu-wiw111710.php
Monday, November 22, 2010
Enabling Terminal's directory and file color highlighting in Mac
By default Mac OS X’s Terminal application uses the Bash shell (Bourne Again SHell) but doesn’t havedirectory and file color highlighting enabled to indicate resource types and permissions settings.
Enabling directory and file color highlighting requires that you open (or create) ~/.bash_profile in your favourite text editor, add these contents:
export CLICOLOR=1 export LSCOLORS=ExFxCxDxBxegedabagacad
… save the file and open a new Terminal window (shell session). Any variant of the “ls” command:
ls ls -l ls -la ls -lah
… will then display its output in color.
More details on the LSCOLORS variable can be found by looking at the man page for “ls“:
man ls
LSCOLORS needs 11 sets of letters indicating foreground and background colors:
- directory
- symbolic link
- socket
- pipe
- executable
- block special
- character special
- executable with setuid bit set
- executable with setgid bit set
- directory writable to others, with sticky bit
- directory writable to others, without sticky bit
The possible letters to use are:
a black b red c green d brown e blue f magenta c cyan h light grey A block black, usually shows up as dark grey B bold red C bold green D bold brown, usually shows up as yellow E bold blue F bold magenta G bold cyan H bold light grey; looks like bright white x default foreground or background
By referencing these values, the strongstrongstrongstrongstrong