TB5 Lecture 1; Recognising Speech

  • Created by: mint75
  • Created on: 21-05-15 15:45

Recognising speech

1) The segmentation problem for speech

  • There are many aspects of speech that implicate recognition. These include the fact that speech is transient (whilst written words arent), speechsoundsruntogether (whilst written words run apart) and there is a high amount of auditory variability in the input, such as accent, voice quality, background noise, speech rate etc.
    • Despite all this, humans can still accurately segment speech. How?
  • Segmentation (in speech) is defined as knowing where one word/phoneme ends and the other begins.
  • However, although it would be assumed that a 'gap' in the input marks a boundary, this is not always the case. This is one cue used amongst many due to the fact that quieter amplitude does not always equal a boundary (e.g sp-oken, quiet amplitude after sp- but this is not a boundary). Boundaries can be observed with no change in amplitude.

So what is the solution to this? As well as context (discussed in section 2) another proposed theory is through the use of rhythm, named metrical segmentation.

Metrical segmentation

  • Proposed by Cutler & Butterfield (1992), this theory states that listeners use the rhythm of the language to segment boundaries.
  • English for example, is quite regular in the spacing of strong, stressed syllables. The listener therefore can use these regularities; strong syllables often mark word onsets.
    • Cutler&Butterfield (1992); Used low volume sentences to test if pps of different native languages used metrical segmentation
      • Found that listeners tended to assume strong syllables were at the onset of words in their perception errors. (English)
      • Other languages that use different rhythms showed different rhythmic mistakes.
        • This effect is also apparent in some #misheardlyrics!!! (Blank Space by Taylor Swift)

2) How context can help in identifying speech sound

  • If an ambiguous phoneme is present, can the context of speech help identify it?
  • This raises questions of interactivity v.s autonomy.
    • Process 1, phoneme recognition feeds into process 2, word recognition, as differences in phonemes change what the unit is (which phonemes should be in this word?).But does 2…


No comments have yet been made