This paper presents a method using speech recognition with linguistic constrains to detect the mispronunciations.
Compared with the standard ASR system, which consists of Acoustic Model, Lexicon and Language Model, the system used for mispronunciation detection only modifies the lexicon to include the possible phoneme confusions for recognition.
The phoneme confusions are gained from cross language phonological comparisons by human beings.
Thus the recognized results would possibly have more errors detected, which are interpreted as mispronunciations.
Actually, we can do alignment instead of recognition for mispronunciation detection. As in learning, the text is known to the speakers and speakers are asked to utter the given sentences.
In this paper, the measures they used are:
1) correctness: the percentage of all correctly detected phones;
2) accuracy: taking account of insertion
3) agreement of the system detection results with human judgments.