Utterance verification is the process where one tries to automatically reject incorrectly recognised utterances, while accepting as many correct results as possible. To this aim the probability of an error is often estimated by a one-dimensional confidence measure. In this paper we take a closer look at incorrect classification. We argue that errors stem from a number of different causes and that this observation must be reflected in the design of the utterance verifier.
Therefore, we developed measures to detect either out-of-vocabulary (OOV) word errors or in-vocabulary substitution errors. To this aim, we compute confidence measures based on the distance between the likelihood of the first best output and two alternative hypotheses: one corresponding to the second best output, the other to the most likely free phone string. 
The paper reports on experiments on spoken Dutch city names for a directory assistance application. The results show that a 10% reduction in Confidence Error Rate can be achieved by using a classification and regression tree instead of a linear combination of the cues with a threshold value.