It is well established that although adult speakers perceive spoken words as having clear boundaries, in reality the signal rarely shows correlates of those boundaries. While adult speakers might exploit a well developed lexicon to parse the speech stream, this is not a viable mechanism for infants who have yet to learn a single word.
Recent focus has been directed at “rhythm” as the relevant mechanism for accessing speech. There is a correlation between languages of different rhythmic types and strategies their speakers use to segment words and syllables (cf. Cutler, 1997). However, just as the human perception of discreteness in speech is deceiving, so too is our perception of rhythm. Rhythm implies an underlying isochrony which, empirically, we have failed to reliably find in natural speech.
Evidence seems to be leaning towards the conclusion that the traditional categories are relevant but that the underlying reality behind our perception of rhythm is something more complex than simply isochrony. Rhythm remains implicated in a wide variety of cognitive functions, and is a compelling candidate for a linguistic bootstrap into speech segmentation.
My current research attempts to draw a bridge between two areas of rhythm research via a computational model: Ramus et al. (1999)’s views of what signal correlates underlie rhythm, and the impact of rhythm-class on the segmentation of words from the speech stream (Cutler, 1997). The primary question that is asked is “Can a simple learning mechanism (an adaptive oscillator model) that responds Ramus et al.’s factors—the percent of the signal that is vocalic, the variance in the duration of consonantal intervals, and the variance in the duration of consonantal intervals—produce behaviour that is consistent with observed differences in segmentation behaviour?”
Preliminary results mimic the patterns Ramus and colleagues have observed and ongoing research is promising. If ultimately successful, this model would provide support for the hypothesis that these factors underlie human perception of rhythm and would provide a plausible explanation for why these factors impact on how humans parse the speech signal, a heretofore unaddressed question. It could also give insight into an open question about the nature of linguistic rhythm: is it categorical or continuous?
Cutler, A. (1997). The syllable's role in the segmentation of stress languages. Language and Cognitive Processes, 12(5/6):839-845.
Ramus, F., Nespor, M., and Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73:265-292.