Several lines of research in our lab are aimed at investigating the neural basis of speech perception. We primarily use the event-related brain potential (ERP) technique, as it provides excellent temporal resolution and allows us to ask questions about acoustic cue encoding and the time-course of spoken word recognition. Other work uses the fast optical imaging tehcnique to localize responses to specific areas of the brain. Find out more about our cognitive neuroscience projects below.

Measuring acoustic cue encoding

A classic question in speech perception concerns whether listeners are sensitive to the continuous acoustic features in the speech signal independently of phonological information. Recent work has shown that listeners can perceive within-category acoustic differences at the level of lexical representations. However, these responses also show effects of phonological categories. Thus, it is unclear whether there is an earlier stage of processing that is not influenced by category information. Behavioral responses are filtered through phonological categories, making it difficult to answer this question.

The event-related brain potential (ERP) technique is well-suited to studying this problem. ERPs are a measure of brain activity that can be obtained using non-invasive electrodes attached to the head. Electrical activity produced by the brain can be detected at the scalp and recorded by these electrodes in real-time. Because of its temporal precision, we can examine speech processing during spoken word recognition and identify components associated with different processes. We conducted an ERP experiment designed to examine the effects of changes in a continuous acoustic cue with respect to perceptual encoding (using the auditory N1; ca. 150 ms post-stimulus) and categorization (using the P3; ca. 450 ms).

Subjects were presented with a series of sounds that varied from one phonetic category to another. For example, the sound clip below varies in voice onset time (VOT) along a continuum from the word dart to the word tart. When you play it you will hear the words varying in nine VOT steps from 0 ms (a good dart) to 40 ms (a good tart):

In this experiment, subjects were presented with stimuli like these and asked whether or not the word they heard was the same as a target word. We found that the amplitude of the auditory N1 (Fig. 1A and 1B) varied linearly with changes in VOT and was not influenced by the phonological category the subject was monitoring for, nor by how they categorized the stimuli.

Fig. 1. ERP data for speech sounds varying in voice onset time (VOT). (A) ERP waveforms as a function of VOT during the time range of the N100. (B) Mean N100 amplitude as a function of VOT, showing a linear effect across the VOT continuum consistent with encoding of continuous acoustic cues. (C) ERP waveforms as a a function of distance from category endpoints. (D) Mean P300 amplitude as a function of distance from category endpoints. Unlike the N100, the P300 is affected by both listeners' phonological categories, and graded acoustic cues.

P3 amplitude also varied with VOT (Fig. 1C and 1D), but depended on which category the subject was monitoring for. This suggests that perception is continuous with respect to changes in the speech signal and that the effects of categories observed in behavioral responses are the result of later-occurring processes that use phonological information.

These results may also have practical implications. This methodology provides a unique window into perceptual processing of speech. Many models of language and reading impairment are centered on the notion that normal language users ignore small differences in sounds, and a failure to do so results in impairment. These results challenge that premise — perceiving fine-grained detail may actually be necessary for successful language use.

We are now looking at responses to acoustic cues for other phonological contrasts (fricatives and vowels), as well as how these responses are affected by context. In addition, we are using fast optical imaging to more directly measure effects in auditory cortex, allowing us to further distinguish between continuous and categorical models.

More information

Frequency specificity of subcortical and cortical responses

The auditory brainstem response (ABR) is an evoked potential generated at very early stages of auditory processing—within the first 5-20 milliseconds of the presentation of a sound. The ABR is measured similarly to the way we measure other electrophysiological responses using the ERP technique. ABR recordings are used in clinical settings to help diagnose hearing loss in infants and other populations, but basic research has not comprehensively explained how the ABR and later-occurring ERP components (e.g., the auditory N100) each change in response to various frequency and intensity differences in sounds. A more nuanced understanding of how these electrophysiological responses vary with different frequencies would not only inform basic research, but could also provide more sensitive diagnostic tools.

We are currently looking at frequency-specific differences in both subcortical (ABR) and cortical (auditory N100) responses. This will allow for a better understanding of how the brain represents fine-grained changes in stimulus frequency, and it will help us understand what information is captured by different types of responses. By measuring both ABR and N100 responses to the same sounds, we can better understand the degree to which frequency information is encoded and maintained throughout early perceptual processing.

Localizing activity using fast optical imaging

Fig. 2. Participant using the fast optical imaging system.

A limitation of the ERP technique is that the responses we measure do not provide clear information about where in the brain the response is generated (spatial information), though they provide very good information when it was generated (temporal information). As a result, it is difficult to say with certainty that this result reflects differences in the representation of speech sounds in the part of the brain that performs the initial processing of auditory stimuli. Knowing where a brain response is generated would allow us to create a map that tells us which populations of neurons respond most strongly to different sounds, which in turn, would help us to distinguish between continuous and categorical models.

Recently, researchers at Illinois (Profs. Monica Fabiani and Gabriele Gratton) have developed a non-invasive imaging method, the event-related optical signal (EROS), that can provide both high temporal and high spatial resolution. This technique uses infrared light to detect changes in the optical properties of cortical neurons when they are active. The figure above shows a participant wearing an array of infrared sources and detectors used in the experiment. Because this technique measures neuronal activity (rather than hemodynamic responses, as fMRI measures), it provides good temporal resolution. In addition, because the light is not scattered when it reaches the scalp (as electrical fields are with ERP measures), it provides good spatial resolution. The EROS technique has also been used previously to study other aspects of language processing (Tse et al., 2007). We are currently using this approach to examine how speech sounds are represented in auditory cortex.