Note: This paper was wirtten for P747 "Seminar in Speech Perception and Recognition" - a class taught by David Pisoni at Indiana University, Spring Semester, 2001. It was revised and subsequently published by the IULC Working Papers Online where it is available fulltext in PDF format. (Volume 1, No. 5). Below is the introduction and the bibliography. If you prefer another format or have any questions or comments, feel free to E-mail me.

Sonority Contours in Speech Recognition: An Examination of the Hoosier Mental Lexicon

by Sean McLennan

2002

1. Introduction

Different phonemes inherently contribute different amounts of energy to the acoustic signal relative to each other. This measure, often referred to as sonority, although difficult to tie down to an absolute acoustic correlate, allows us to rank phonemes relative to each other on a sonority scale (Ladefoged, 1982). Empirically, the sonority scale carries a great deal of descriptive power. For example, it is often utilized in discussions of syllable structure and phonotactic constraints – the nucleus of a syllable is always a peak of sonority, typically surrounded by phonemes of decreasing sonority. The sonority scale has also been useful in characterizing statistical and implicational universals in language. For example, crosslinguistically, the least marked syllable structure is CV – no language excludes it (Kager,1999:95). However, within the inventory of possible CV syllables, some phoneme combinations are more preferred than others; i.e. those that maximize the slope in the change of sonority from C to V. Thus /pa/ is less marked than /ma/ because /p/ is much less sonorant than /m/ and so causes a steeper sonority contour. These markedness relationships have been relevant in descriptions of a wide variety of data ranging from historical linguistics to acquisition (O’Grady & Dobrovolsky, 1992).

Since it seems so ubiquitous in the construction of words, it stands to reason that sonority may also play a significant role in speech recognition. In fact, in a casual experiment based on the “telephone game” in which subjects must pass along a message by whispering it to a neighbor, it was found that although a wide variety of transmission errors occurred (including ellipsis, epenthesis, and substitution), every phonological change constituted an “improvement” in syllable structure, where “improvement” is defined as a move towards the universally unmarked syllable structure and sonority contour (Kelly et al. 2000). In other words, in degraded stimuli conditions, there is a preference for words with preferred sonority contours.

The notion that word structure plays a role in word recognition is not new. Landauer and Streeter (1973) compared rare and frequent words and found that they differed significantly in their phoneme and letter distributions, mean word length, and similarity neighborhoods (“neighborhood” here defined as a set of words that differ by a single character). Eukel (1980) also showed a correlation between phonotactic constraints and subjective frequency judgments. Perhaps most significantly, Shipman and Zue (1982) showed that phonemic patterns are sufficient to isolate a significant number of the words in a 20,000 word lexicon (for example, the pattern: [cons] [cons] [l] [vowel] [nasal] [stop] uniquely identifies “splint” in English). They also showed that this number increases monotonically with the size of the lexicon. Although Shipman and Zue make no explicit reference to sonority – their categories were based on Zue’s extensive experience reading spectrograms – the categories correspond almost perfectly to the divisions relevant in the sonority scale.

Shipman and Zue focused only on the relationship between phonemic patterns and lexicon size – no in-depth study has been made of the structural properties of the lexicon independent of word frequency or specifically with reference to sonority. What follows is such an examination. It can be considered an extension of the findings of Shipman and Zue (1982). It uses the Hoosier Mental Lexicon (Nusbaum et al, 1984) as a basis for a statistical analysis of the sonority contours of approximately 20,000 words and their relationship to frequency and familiarity.

References:

Eukel, B. 1980. A phonotactic basis for word frequency effects: Implications for automatic speech recognition. Journal of the Acoustical Society of America. 68, S33.

Kager, Rene. 1999. Optimality Theory. Cambridge Univeristy Press: Cambridge, MA.

Kelly, A., M. Gonzalez-Marquez, and S. McLennan. 2000. Chain whisper dynamics in native and non-native groups: A tripartite analysis of communication dynamics. Proceedings of Complex Systems Summer School 2000. The Santa Fe Institute: Santa Fe, NM.

Kenstowicz, M. 1994. Phonology in Generative Grammar. Blackwell: Cambridge, MA.

Ladefoged, P. 1982. A Course in Phonetics. Harcourt Brace Jovanovich: Chicago, IL. Landauer, T.K. and L.A. Streeter. 1973. Structural differences between common and rare words: Failure of equivalence assumptions for theories of word recognition. Journal of Verbal Learning and Verbal Behavior. 12, 119-131.

Luce, P.A. & D.B. Pisoni. 1998. Recognizing spoken words: The Neighborhood Activation Model. Ear & Hearing. 19, 1-36.

Marslen-Wilson, W. 1987. Functional parallelism in spoken word recognition. Cognition. 25, 71-102.

Nusbaum, H.C., D.B. Pisoni, and C.K. Davis. 1984. Sizing up the Hoosier Mental Lexicon: Measuring the familiarity of 20,000 words. Research on Speech Perception Progress Report No. 10, 357-376.

O’Grady, W. and M. Dobrovolsky. 1992. Contemporary Linguistic Analysis: An Introduction. Copp Clark Pitman Ltd.: Toronto, ON.

Shipman, D.W. & V.W. Zue. 1982. Properties of large lexicons: Implications for advanced isolated word recognition systems. Paper presented at the IEEE International Conference on Acoustics, Speech and Signal Processing. Paris, FR. 1-4.