Construction of robust systems for
speech-to-speech translation to facilitate cross-lingual oral
communication has been the dream of speech and natural language
researchers for decades. It is technically extremely difficult because
of the need to integrate a set of complex technologies – Automatic
Speech Recognition (ASR), Natural Language Understanding (NLU), Machine
Translation (MT), Natural Language Generation (NLG), and Text-to-Speech
Synthesis (TTS) – that are far from mature on an individual basis, much
less when cascaded together. Blindly integrating ASR, MT and TTS
components does not provide acceptable results because typical machine
translation technologies, primarily oriented towards well-formed
written text, are not adequate to process conversation speech materials
rife with imperfect syntax and speech recognition errors. Initial work
in this area in the 1990s, for example, by researchers at CMU and
Japan’s ATR labs, resulted in systems severely limited to a small
vocabulary or otherwise constrained in the variety of expressions
supported. Currently, the only commercial available speech translation
technology is Phraselator, a simple unidirectional translation device
that is customized for military use. It searches from a fixed number of
English sentences and plays out the corresponding voice recordings in
foreign languages, and cannot handle bidirectional speech.
Resources:
IBM Lab: Speech-to-Speech Translation
http://domino.watson.ibm.com/comm/research.nsf/pages/r.uit.innovation.html
http://www.google.com/goog411/
http://googlesystem.blogspot.com/2008/10/machine-translation-and-speech.html
TC-STAR
http://www.tc-star.org/
CMU-LTI
http://www.lti.cs.cmu.edu/Research/cmt-projects.html
http://domino.watson.ibm.com/comm/research.nsf/pages/r.uit.innovation.html/$FILE/speech_to_speech.mpg
Books:
Verbmobil: Foundations of Speech-to-Speech Translation
By Wolfgang Wahlster
http://books.google.com.sg/books?id=RiT0aAzeudkC&printsec=frontcover
http://books.google.com.sg/books?id=IsqLGQAACAAJ&dq=speech+translation&ei=QbonSuPPNY2GkQTb3KjaCg
One of the main lessons learned from all the research during the past three decades is that the problems of natural language understanding can only be cracked by the combined muscle of deep and shallow processing approaches. This means that corpus-based and probabilistic methods must be integrated with logic-based and linguistically inspired approaches to achieve true progress on this AI-complete problem.
speech-to-speech translation to facilitate cross-lingual oral
communication has been the dream of speech and natural language
researchers for decades. It is technically extremely difficult because
of the need to integrate a set of complex technologies – Automatic
Speech Recognition (ASR), Natural Language Understanding (NLU), Machine
Translation (MT), Natural Language Generation (NLG), and Text-to-Speech
Synthesis (TTS) – that are far from mature on an individual basis, much
less when cascaded together. Blindly integrating ASR, MT and TTS
components does not provide acceptable results because typical machine
translation technologies, primarily oriented towards well-formed
written text, are not adequate to process conversation speech materials
rife with imperfect syntax and speech recognition errors. Initial work
in this area in the 1990s, for example, by researchers at CMU and
Japan’s ATR labs, resulted in systems severely limited to a small
vocabulary or otherwise constrained in the variety of expressions
supported. Currently, the only commercial available speech translation
technology is Phraselator, a simple unidirectional translation device
that is customized for military use. It searches from a fixed number of
English sentences and plays out the corresponding voice recordings in
foreign languages, and cannot handle bidirectional speech.
Resources:
IBM Lab: Speech-to-Speech Translation
http://domino.watson.ibm.com/comm/research.nsf/pages/r.uit.innovation.html
http://www.google.com/goog411/
http://googlesystem.blogspot.com/2008/10/machine-translation-and-speech.html
TC-STAR
http://www.tc-star.org/
CMU-LTI
http://www.lti.cs.cmu.edu/Research/cmt-projects.html
http://domino.watson.ibm.com/comm/research.nsf/pages/r.uit.innovation.html/$FILE/speech_to_speech.mpg
Books:
Incremental speech translation
http://books.google.com.sg/books?id=QEr6dTamixQC&printsec=frontcover&dq=speech+translation&ei=QbonSuPPNY2GkQTb3KjaCg#PPA1,M1Verbmobil: Foundations of Speech-to-Speech Translation
By Wolfgang Wahlsterhttp://books.google.com.sg/books?id=RiT0aAzeudkC&printsec=frontcover
Speech-to-speech translation
http://books.google.com.sg/books?id=T0diAAAAMAAJ&q=speech+translation&dq=speech+translation&ei=QbonSuPPNY2GkQTb3KjaCg&pgis=1Machine Translation
By Conrad Sabourin, Laurent Bourbeauhttp://books.google.com.sg/books?id=IsqLGQAACAAJ&dq=speech+translation&ei=QbonSuPPNY2GkQTb3KjaCg
KI 2006
By Christian Freksa, Michael Kohlhase, Kerstin SchillOne of the main lessons learned from all the research during the past three decades is that the problems of natural language understanding can only be cracked by the combined muscle of deep and shallow processing approaches. This means that corpus-based and probabilistic methods must be integrated with logic-based and linguistically inspired approaches to achieve true progress on this AI-complete problem.
No comments:
Post a Comment