Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control

Select a language for the TTS:
UK English Female
UK English Male
US English Female
US English Male
Australian Female
Australian Male
Language selected: (auto detect) - EN

Play all audios:

ABSTRACT Proper speech production requires auditory speech feedback control. Models of speech production associate this function with the right cerebral hemisphere while the left hemisphere

is proposed to host speech motor programs. However, previous studies have investigated only spectral perturbations of the auditory speech feedback. Since auditory perception is known to be

lateralized, with right-lateralized analysis of spectral features and left-lateralized processing of temporal features, it is unclear whether the observed right-lateralization of auditory

speech feedback processing reflects a preference for speech feedback control or for spectral processing in general. Here we use a behavioral speech adaptation experiment with dichotically

presented altered auditory feedback and an analogous fMRI experiment with binaurally presented altered feedback to confirm a right hemisphere preference for spectral feedback control and to

reveal a left hemisphere preference for temporal feedback control during speaking. These results indicate that auditory feedback control involves both hemispheres with differential

contributions along the spectro-temporal axis. SIMILAR CONTENT BEING VIEWED BY OTHERS LANGUAGE SPECIFICITY IN CORTICAL TRACKING OF SPEECH RHYTHM AT THE MORA, SYLLABLE, AND FOOT LEVELS

Article Open access 05 August 2022 SPEAKING RHYTHMICALLY CAN SHAPE HEARING Article 12 October 2020 INDUCED ALPHA AND BETA ELECTROENCEPHALOGRAPHIC RHYTHMS COVARY WITH SINGLE-TRIAL SPEECH

INTELLIGIBILITY IN COMPETITION Article Open access 23 June 2023 INTRODUCTION In human verbal communication, the aim of speaking is to create an auditory percept that can easily be decoded by

a listener’s brain. Speech production models propose that this is achieved in two ways1,2,3,4. Auditory, but also somatosensory feedback of an utterance is analyzed to detect and correct

mismatches between produced and intended speech. However, speech production cannot rely entirely on such slow mechanisms, because natural speech is faster than feedback control could

explain5. Feedforward control based on internal representations of auditory-motor associations allows speaking rates that correspond to the observed speed of human communication.

Nevertheless, auditory feedback contains valuable information to preserve stable and intelligible articulation in a variety of settings6 and is used to acquire and maintain auditory-motor

associations, sometimes called speech motor programs. Perturbations of the auditory speech feedback typically induce changes in articulation that compensate for the disturbance. If

perturbations are experienced repeatedly and in a predictable manner, the new action-perception association is learned and speech motor programs are updated7,8. It is believed that

feedforward and feedback control are specialized functions of the two cerebral hemispheres. The highly influential DIVA speech production model proposes that the left hemisphere is

specialized in feedforward specifications of motor outputs while the right hemisphere processes auditory speech feedback to refine motor output based on external sensory information1. While

numerous imaging studies on auditory feedback control report, indeed, right-lateralization of auditory feedback processing9,10,11,12,13,14, other imaging studies propose speaking-related

auditory-motor processing in the left hemisphere2,5,15,16. To date, there is no consensus on the specific contributions of the two cerebral hemispheres to auditory speech feedback control.

One important limitation of past imaging studies is the fact that only auditory speech feedback control based on spectral vowel features like fundamental frequency or formant structure has

been investigated7,17,18. Temporal perturbations that prolong or compress speech locally change the length of phonemes and their transitions. Speech feedback control based on these temporal

speech features during speaking, however, has only been described on a behavioral level8,19. In auditory perception, spectral processing has been linked with right-lateralized computations

in the auditory cortex while the left hemisphere is thought to preferentially process temporal features of auditory stimuli20,21. Sensory processing is an essential part of feedback control.

In consequence, it is unclear whether the observed right-lateralization in previous studies results from the exclusive use of a spectral feedback perturbation or whether it reflects

right-lateralization of auditory speech feedback control in general1, independent of the type of acoustic perturbation. We investigated whether feedback control based on spectral and

temporal speech features during speech production follows the proposed spectral/temporal distinction in functional lateralization of auditory perception. This would predict that both

hemispheres contribute to auditory speech feedback control with a preference for auditory-motor processing of spectral speech features in the right and for auditory-motor processing of

temporal speech features in the left hemisphere. Healthy participants read out loud monosyllabic pseudowords while hearing their own voice (altered or unaltered in real time) via headphones.

By using both vowels and consonants as perturbed speech items, we excluded the possibility that the observed lateralization results from potential hemispheric differences in processing

vowels and consonants22. Spectral perturbations changed characteristic vowel or consonant frequencies in the acoustic domain during the entire duration of the phoneme. Vowels’ (/ɪ/) first

formant (F1) or consonants’ (/∫/) center of gravity (COG) were shifted upwards by up to 20% of production. Temporal perturbations changed the length of vowels or consonants in the acoustic

domain23 and increased phoneme duration by about 50 ms. In a first speech production experiment, feedback perturbations were introduced stepwise over 40 trials (ramp phase) before the amount

of perturbation was kept constant at 20% relative to production (hold phase) for another 20 trials and finally removed (after effect phase). We presented altered auditory feedback in a

dichotic manner, meaning that one ear was stimulated with altered auditory feedback while the other ear simultaneously received the original, unaltered input. Dichotic stimulation biases

processing in the auditory cortex to the input of the contralateral ear24. We predicted that compensation in response to the perturbed feedback would be greater and the produced speech

output would be closer to the auditory target if altered auditory feedback was presented to the ear contralateral to the hemisphere which preferentially analyzes the perturbed speech

feature. We thus expected a left ear (right hemisphere) advantage for compensating spectral auditory feedback alterations and a right ear (left hemisphere) advantage for compensating

temporal feedback alterations. While dichotic stimulation can answer questions on hemispheric specialization, it represents an unnatural experimental condition. To investigate functional

lateralization of speech feedback control when identical auditory feedback is perceived with both ears, as during natural speaking, we further investigated speaking while listening

binaurally to altered or normal auditory feedback during fMRI. In addition, speech adaptation was investigated by contrasting resting-state functional connectivity after and before learning

new auditory-motor associations. We hypothesized that regional activity and functional connectivity profiles indicate a right hemispheric preference for spectral feedback control and a left

hemisphere preference for temporal feedback control of speech. Our data demonstrate the hypothesized functional lateralization and identify lateralized feedback control-related activations

in frontal and temporal cortices. Following temporal adaptations, auditory-motor learning increases fronto-temporal interactions in the left hemisphere while spectral adaptations increase

fronto-temporal resting-state connectivity in the right hemisphere. RESULTS Absolute values of produced spectral and temporal speech features in consonants and vowels were rendered

comparable by normalizing them to averaged preperturbation values during baseline (see Methods). Participants expectedly changed articulation to compensate for the perturbations. We first

checked whether perturbations efficiently induced compensation by investigating production changes in the binaural condition. Compared to preperturbation values, participants lowered F1 of

the vowel or the COG of the fricative in response to spectral feedback perturbations (estimate = −0.027, SE = 0.012, _t_(35) = −2.22, _p_ = 0.03) and shortened the vowel/fricative in

response to temporal feedback perturbations increasing phoneme durations (estimate = −0.029, SE = 0.012, _t_(35) = −2.37, _p_ = 0.024) with considerable interindividual variability (Fig.

1a). This effect was also observed if data of the spectral and temporal group were analyzed in two separate models (spectral: estimate = −0.023, SE = 0.011, _t_(16) = −2.03, _p_ = 0.05;

temporal: estimate = −0.033, SE = 0.013, _t_(16) = −2.47, _p_ = 0.026). Further, production changes in response to binaurally presented perturbations were greater compared to production

changes in the control condition with binaurally presented unaltered feedback (_F_(3, 31)PertvsControl = 6.18, _p_ = 0.019) with subthreshold _t_ values for the separate estimates

(estimatespectral = 0.018, SE = 0.01, _t_(31) = 1.75, _p_ = 0.09 and estimatetemporal = 0.017, SE = 0.01, _t_(31) = 1.63, _p_ = 0.11 Fig. 1a). Degree of compensation was largely independent

of the target phoneme of the perturbation (_F_(1, 31)VowelvsConsonantXPertvsControl = 0.006, _p_ = 0.97; _F_(1, 31)VowelvsConsonantXSpectralvsTemporalXPertvsControl = 0.175, _p_ = 0.68).

Expectedly, the amount of compensation did not differ between spectral or temporal perturbations when altered auditory feedback was presented to both ears (_F_(1,

31)SpectralvsTemporalXPertvsControl = 0.003, _p_ = 0.96). Participants also changed their speech production to compensate for the perturbations in response to dichotically presented altered

auditory feedback, the main conditions of interest in this experiment (_F_(3, 34)Block = 6.144, _p_ = 0.002, Fig. 1b, c). Importantly, the degree of compensation in the two dichotic

conditions showed an interaction between ear and type of feedback perturbation (_F_(1, 31)SpectralvsTemporalXLeftvsRight = 8.47, _p_ = 0.007, Figs. 1 and 2), indicating different hemispheric

preferences for auditory-motor processing of spectral and temporal speech features. Planned comparisons revealed that compensation of spectral feedback perturbations displayed a left

ear/right hemisphere advantage (estimate = 0.0139, SE = 0.008, _t_(38) = 1.72, _p_ = 0.046, orange in Fig. 2). In contrast, compensation of temporal feedback perturbations was greater when

the right ear/left hemisphere was presented with the prolonged phoneme (estimate = 0.0141, SE = 0.008, _t_(39) = 1.71, _p_ = 0.047, blue in Fig. 2). There was a marginal trend that

compensation was overall larger for dichotically presented perturbations applied to vowel compared to consonant acoustics (_F_(1, 31)VowelvsConsonant = 3.18, _p_ = 0.08, estimatevowel =

0.39, SE = 0.012, _t_(36) = 3.33, _p_ = 0.002; estimatefricative = 0.013, SE = 0.012, _t_(36) = 1.03, _p_ = 0.31). In isolation, this finding could potentially indicate that vowel rather

than consonant perturbations induced compensatory responses in the dichotic conditions. However, the target phoneme of the perturbation (vowel or consonant) did not significantly influence

the lateralization effect (_F_(1, 31)VowelvsConsonantXLeftvsRight = 1.41, _p_ = 0.244, _F_(1, 31)VowelvsConsonantXLeftvsRightXSpectralvsTemporal = 2.35, _p_ = 0.14). This behavioral

experiment employing dichotic auditory feedback perturbation indicates differential hemispheric preferences for auditory-motor processing of spectral and temporal features during speaking.

However, these findings do not reveal brain regions that contribute to lateralized processing during speaking with binaural auditory feedback. We performed a sparse sampling fMRI study with

22 participants who were exposed to the identical, yet binaural, spectral auditory feedback perturbations and 22 other participants who experienced the same binaural temporal auditory

feedback perturbations as in the behavioral study to answer this question. Participants in the fMRI study also changed their speech production upon perturbation in relation to

preperturbation values (Fig. 3a) with marginal significant compensation for spectral (estimate = 0.0112 SE = 0.006, _t_(50) = 1.7, _p_ = 0.09) and significant compensation for temporal

perturbations (estimate = 0.03 SE = 0.006, _t_(50) = 4.63, _p_ < 0.001). When comparing speaking with altered auditory feedback to the control condition instead of preperturbation values,

compensation was only significant for the temporal perturbation group (_F_(1, 42)SpectralvsTemporalXPertvsControl = 22.02, _p_ < 0.001; estimatespectral = 0.003 SE = 0.005, _t_(42) =

0.52, _p_ = 0.6; estimatetemporal = 0.036 SE = 0.005, _t_(42) = 7.11, _p_ < 0.001). This resulted from carry over effects from the experimental to the control condition in the spectral

perturbation group (see Fig. 3a). Whether spectral or temporal perturbations targeted the vowel or consonant did not significantly influence the amount of compensation (_F_(1,

43)VowelvsFricativ = 0.007, _p_ = 0.93). Participants significantly changed their speech production in response to perturbations of vowel (estimatevowel = 0.02, SE = 0.01, _t_(42) = 1.99,

_p_ = 0.05) and fricative acoustics (estimatefricative = 0.021, SE = 0.005, _t_(42) = 4.03, _p_ < 0.001). A conjunction analysis of speaking with binaurally altered auditory feedback of

the vowel and consonant compared to normal speaking in the same run revealed activity associated with auditory-motor processing of spectral or temporal speech features (Fig. 3b and Table 1).

Based on prior imaging studies on auditory feedback control, the search space for this analysis was restricted to auditory, ventral premotor and inferior frontal regions25 in both

hemispheres (see methods for details). Spectral control (orange in Fig. 3b) was associated with increased activity in a cluster of voxels in the right inferior frontal gyrus (IFG) that

extended into the frontal operculum and in clusters in bilateral secondary auditory cortex (superior and middle temporal gyrus (STG and MTG), and superior temporal sulcus (STS)). Processing

of temporal speech features during auditory feedback control (blue in Fig. 3b) was associated with increased activity in left IFG, left STG and STS, and right MTG. To test whether activity

associated with spectral and temporal auditory-motor control was indeed lateralized, we calculated weighted bootstrapped lateralization indices (LI) in the pre-defined auditory and frontal

ROIs. This represents a threshold-free estimate of lateralization that considers the extent and size of lateralization26. Speaking with spectrally altered auditory feedback compared to

normal speaking showed strongly right-lateralized activity in frontal (LI = 0.46) and in auditory cortices (LI = 0.51, Fig. 3c). Speaking with temporally altered auditory feedback compared

with unperturbed speaking, on the other hand, was strongly left lateralized in frontal (LI = −0.835) and weakly left lateralized in auditory cortex (LI = −0.116, Fig. 3c). When testing

lateralization voxel-wise by flipping contrast maps and subtracting these from the original images27, left-lateralized activity was observed in anterior portions of auditory cortex and the

pars orbitalis of the inferior frontal gyrus when compensating for temporal perturbations. In contrast, spectral feedback control was associated with right-lateralized activity in anterior

and posterior auditory cortex and pars triangularis of the inferior frontal gyrus (Table 2, Fig. 3d). We further examined whether individual task-related activity in spectral and temporal

feedback control areas (6 mm spheres centered on functional peak activations reported in Table 1) was related with the individual degree of compensation to spectral and temporal feedback

perturbations, averaged over vowel and consonant perturbations. There was a consistent trend (illustrated in Supplementary Fig. 1) that stronger activity in left temporal feedback control

areas was associated with greater compensation of temporal feedback perturbations in anterior STS (_r_ = 0.403, _p_ = 0.063), IFG pars triangularis (_r_ = 0.41, _p_ = 0.059) and IFG pars

orbitalis (_r_ = 0.317, _p_ = 0.151). Temporal feedback control activity in right anterior STS was not related with the amount of compensation to temporal feedback perturbations (_r_ =

0.096, _p_ = 0.669). Similar associations between spectral feedback control activity and acoustic measures during spectral feedback perturbations were nonsignificant (_r_RIFGtri = 0.14, _p_

= 0.53; _r_RpSTS = 0.051, _p_ = 0.821; _r_LpSTS = 0.097, _p_ = 0.6674), probably due to overall lower variability in acoustic measures during spectral perturbations (SDspectral = 0.02,

SDtemporal = 0.046). We assessed whether adapting to auditory feedback perturbations lead to learning-related changes in resting-state functional connectivity by contrasting resting-state

fMRI data after and before the speech adaptation run. Learning-related plasticity should depend on the degree of compensation. Therefore, we tested whether functional resting-state

connectivity between feedback control-related seeds (6 mm spheres centered on functional peak activations reported in Table 1) and the ipsilateral rest of the brain was modulated by

individual production changes. Indeed, the more participants adapted speech production to spectral perturbations, the more they increased fronto-temporal connectivity in the right and

temporo-parietal coupling in the left hemisphere (orange in Fig. 4, Table 3). In contrast, acquiring new auditory-motor associations following temporal perturbations of the auditory feedback

was associated with increased fronto-temporal coupling only in the left hemisphere. In addition to the increased coupling with the left inferior frontal gyrus, the left auditory association

cortex connected more strongly with the left supplementary motor area (SMA) following temporal perturbations (blue in Fig. 4, Table 3). DISCUSSION Both the behavioral and the neuroimaging

study provide evidence for bihemispheric, yet asymmetric, auditory feedback control of speaking. Right-lateralization of auditory feedback control9,10,11,12,13,14 resulted from spectral

perturbations. Temporal speech features, in contrast, were more strongly processed by the left hemisphere. Learning-related neuroplasticity increased fronto-temporal interactions in the

right hemisphere following spectral perturbations and in the left hemisphere following temporal perturbations. Our findings suggest a modification of prevailing speech production models.

Previous theoretical models propose a single auditory feedback controller, either in the right hemisphere1 or did not specify the contributions of the two cerebral hemispheres3. Previous

studies have already demonstrated involvement of both hemispheres during speaking9,14,28,29,30. We specify here that fronto-temporal cortices in both hemispheres serve speech feedback

control. We propose a refined theoretical speech production model based on the DIVA model, because it constitutes the only neurocomputational model that takes contributions of both

hemispheres into account1. The DIVA model proposes a feedback control system in the bilateral superior temporal cortex that compares external auditory feedback with sensory predictions from

a left-lateralized feedforward system in the frontal lobe. A potential mismatch signal is transferred to the right IFG, which is thought to constitute the only region that translates

deviations from sensory predictions into corrective motor commands that in turn are fed into the left-lateralized feedforward system. Our data indicate that the right auditory association

cortex monitors preferentially spectral speech features while the left auditory association cortex preferentially monitors temporal speech features (see Fig. 5). This functional

lateralization bases upon a hemispheric specialization in auditory processing20,21. Mismatches are forwarded to the respective ventral inferior frontal gyri which amplifies functional

lateralization in the frontal cortices. Our data suggest that in the left hemisphere, the ventral IFG adapts primarily temporal properties of motor commands, which results in updated motor

timing or adapted velocities of articulator actions. In the right hemisphere, the ventral IFG likely corrects the articulatory targets of the motor command, which affects position of the

articulators and in turn results in spectral adaptations of the acoustic speech signal. This specifies the external auditory feedback control system in such a way that it consists of two

parallel loops in the two hemispheres. Functional lateralization does not indicate a dichotomy but rather a slight shift in the equilibrium between functional homologues. It is thus likely

that information in both hemispheres is integrated on multiple levels via interhemispheric interactions29,31. Because lateralization studies on different aspects of somatosensory feedback

processing are lacking, we did not specify the contributions of the two cerebral hemispheres to somatosensory feedback control in our model. The few imaging studies on somatosensory feedback

processing during speaking suggest a comparable functional lateralization as in the auditory domain. Somatosensory feedback processing during articulation is associated with

left-lateralized activity in the supramarginal gyri16,32. Perturbations of somatosensory feedback increase activity in the bilateral supramarginal gyri and in the right ventral IFG33,

possibly because the studied perturbation required adapting position of articulators more than their velocities. The main new feature of the external auditory feedback loop, the parallel

processing of auditory information in both hemispheres, can also be incorporated into external feedback loops of state feedback control models of speech production. However, in contrast to

the DIVA model, those models propose an additional internal feedback loop that estimates sensory consequences of articulation even in the absence of overt motor behavior and external

feedback3,15. Learning new auditory-motor associations should affect also internal representations of actions and their sensory consequences. The observed changes in resting-state functional

connectivity point to plasticity associated with the updating of internal models. Updating temporal properties of auditory-motor associations increased interactions between the ventral IFG

and the auditory association cortex only in the left hemisphere. In contrast, new auditory-motor associations in response to spectral perturbations increased coupling between the right

auditory association cortex and the right ventral IFG. This suggests that both hemispheres update their auditory-motor speech representations. The observation of, both, a left and a right

internal fronto-temporal loop appears in contradiction with the proposal of a single left-lateralized internal auditory-motor interface in the left temporo-parietal junction (TPJ)2,15.

However, our resting-state data confirmed that the left TPJ was associated with auditory-motor learning. This suggests that left TPJ plays a special role in relaying top-down predictions

from the left-lateralized feedforward to the bihemispheric feedback control system. The fact that left TPJ changed its functional connectivity primarily when learning new auditory-motor

associations following spectral perturbations was surprising, since a role of the right TPJ in internally representing spectral speech features could have been envisaged. However, the right

TPJ did not play a role in this context. A greater involvement of left TPJ in internally representing spectral compared to temporal speech features raises the question whether temporal

adaptions also involve another region linking the feedforward to the feedback system. The left auditory association cortex connected more strongly with the left SMA following temporal

perturbations. The SMA has been implicated in action timing and in processing temporal aspects of sensory input34,35,36,37,38. We thus propose the SMA as an additional motor-to-auditory

interface that internally translates the temporal structure of articulator actions into expected auditory consequences. In Fig. 5, this would translate into two parallel inputs into the

feedback control system, one preferentially predicting temporal speech features via the left SMA and one preferentially predicting spectral speech features via left TPJ. Because this

proposal needs to be backed up by additional empirical evidence we did not yet implement this into our model. Why should the brain process spectral and temporal aspects of auditory input

differently in the two hemispheres? Parallel processing of complex sensory information allows for rapid and efficient responses but asks for separating processing chains to a certain degree.

Such separation has been proposed to result from differential filtering of the input in sensory association cortices20,21,39,40. A right hemisphere preference for spectrally tuned and a

left hemisphere preference for temporally tuned auditory receptive fields has been reported earlier41,42 but was only recently shown for speech stimuli21,43. Differential sensitivity of the

two cerebral hemispheres to temporal and spectral modulation rates in acoustic signals has been proposed to result in auditory representations with high temporal and low spectral resolution

in the left hemisphere compared to auditory representations with low temporal and high spectral resolution in the right. Accordingly, also the right auditory association cortex represents

temporal information, although with relative low temporal resolution, during speech perception44 or during nonspeech auditory-motor control38. In our experiments, the temporal perturbation

increased phoneme duration by about 50 ms. Controlling such temporal changes requires high temporal resolution and thus a sensitivity for high temporal modulation rates. Detecting spectral

changes of about 20% relative to production, on the other hand, requires analyzing spectral modulation rates with sufficient resolution. Consequently, auditory-motor control based on

spectral speech features lateralizes to the right. Temporal speech features in the studied range represent phonemic contrasts, such as short versus long vowels, single versus geminate

consonants, or voiced versus voiceless consonants45 and thus code linguistic information. Also, temporal stretching and shrinking of segments mark prosodic boundaries46 and stress, as well

as accent45. Spectral speech features represent linguistic information concerning vowel identity, sonorants, and place of articulation of consonants47. In addition, spectral features code

speaker’s gender, age, size, dominance and turn taking behavior48,49,50. We observed considerable interindividual variability in the degree of compensation that was only partially explained

by regional brain activity or functional connectivity. As in other speech perturbation studies7,51, individuals differ in the degree and sometimes even direction of compensation and more

work is needed to understand the sources of such interindividual variability. One could argue that the observed lateralization in speaking-related brain activity associated with spectral and

temporal perturbations represents a trivial consequence of bottom-up processing of speech with different temporal and spectral characteristics21. However, functional asymmetries during

speaking were not only observed in auditory but also in inferior frontal regions that have been associated with compensation of perturbed auditory feedback9,25. More importantly, the

differential involvement of the left and right hemisphere in temporal and spectral auditory feedback control also manifested in traces of auditory-motor learning in the resting-state data.

We thus interpret the observed functional lateralization during speaking as evidence of active auditory-motor processing serving speech control in both sensory and nonsensory cortices. In

contrast, connectivity between cerebral speech production regions and the cerebellum52 was not increased by auditory-motor learning. Of note, feedback control is only one component of

producing fluent and intelligible speech. Other functions during speech production like e.g., speech forward control, syntactic or semantic processing, may follow other principles of

hemispheric specialization53. A far greater role of the left compared to the right hemisphere in feedforward control of speech has been well established54. In the context of auditory speech

feedback control, however, the right hemisphere seems to be as important as the left hemisphere to improve articulation by means of sensory information. Our results indicate that both

hemispheres are involved in the processing of auditory speech feedback to control articulation, contrary to the view of a general right hemisphere preference for feedback control during

speech production. We identified one factor that determines the degree to which both hemispheres contribute to feedback control of speaking. The present study highlights that compensating

and learning new spectral auditory-motor associations recruits primarily the right hemisphere while compensating and learning new temporal auditory-motor associations recruits especially the

left hemisphere during speech production. METHODS PARTICIPANTS Forty healthy volunteers (20 female) participated in the behavioral experiment and 44 healthy volunteers (27 female) in the

fMRI study. Participants were adult right-handed native speakers of German (handedness score55 behavioral study mean = 93, SD = 9.8, fMRI study mean = 90, SD = 11.5) and reported normal

speech and hearing. All participants gave their written informed consent before participation. Four participants had to be excluded from the behavioral experiment because real-time tracking

of formants (_n_ = 1) and vowel and fricative portions (_n_ = 2) did not consistently work or task instructions have not been followed (_n_ = 1, singing instead of speaking). The study was

approved by the ethics committee of the Medical Faculty of Goethe-University Frankfurt (DFGKE 1514/2-1) and was in accordance with the Declaration of Helsinki. BEHAVIORAL EXPERIMENT Sixteen

experimental manipulations were studied in a mixed within and between subject design to reduce the number of conditions per participant, which is important because parallel implicit learning

of new auditory-motor associations has so far only been reported for up to three different perturbations56. Participants were evenly divided across four experimental groups that differed

with respect to the acoustic property that was altered throughout the experiment. Participants either experienced spectral or temporal perturbations of the vowel or the consonant in their

auditory speech feedback. Each participant experienced four different conditions (binaural unaltered feedback, binaural altered feedback and two dichotic conditions). Participants read words

out loud, speaking into a microphone (AT 2010, Audio-Technica) placed 10 cm in front of them and were told that they heard their utterances via headphones (DT770 PRO, beyerdynamics). The

level of auditory feedback provided by the headphones was amplified (~+15 dB relative to the level at the microphone) to reduce the influence of unaltered bone conducted auditory

feedback7,18,19. Altered auditory feedback was either presented to both ears (binaurally), to only the right ear or to only the left ear while the other ear received the unaltered auditory

feedback (dichotic conditions). In the latter two conditions, auditory processing is biased to the input of the contralateral ear24. To ensure that changes in speech production were related

to auditory speech feedback perturbations and not just a side-effect of word repetition, the original, unaltered feedback was presented to both ears in an additional control condition. To

enable the acquisition of several distinct auditory-motor transformations in parallel, each of the four feedback presentation modes was associated with a different, predictable word

context56. The real-time feedback alteration targeted always the same part of monosyllabic CVC-pseudowords (either the vowel or second consonant) while the other phonemes implicitly

distinguished between conditions. Within participants, the allocation of word context to feedback condition was consistent. Over participants, the association between word context and

feedback condition was counterbalanced. Experimental stimuli were chosen such that they (1) facilitated online consonant and vowel tracking, (2) could be altered in the spectral and temporal

domain, (3) provided context information to learn multiple auditory-motor transformations in parallel, and (4) introduced as little acoustic variability between syllables as possible. To

facilitate the algorithmic distinction (which relied on the presence/absence of voicing) between vowels and consonants in CVC-pseudowords, the second consonant was voiceless. Further,

contrary to most studies targeting the vowel /e/ for spectral feedback perturbations, we have chosen the high vowel /ɪ/ to reduce the likelihood of glottalisation57. This was of importance,

since glottalization interferes with the temporal feedback perturbation of the vowel. Spectral and temporal perturbations of the consonant targeted always the voiceless fricative /∫/, which

can be altered to be perceived as /s/. The preceding vowel was chosen in such a way that global shifts in the fricative spectrum due to lip rounding were kept at a minimum, i.e., it was not

a rounded vowel. This was of importance, since such coarticulatory shifts in the frequency spectrum would have biased perception of /∫/ into the direction of /s/58. In consequence,

CVC-pseudowords for perturbations of the vowels were [bɪʃ], [bɪf], [bɪҫ], and [bɪs]. CVC-pseudowords for perturbations of the consonants were [bɪʃ], [bεʃ], [baʃ] and [bœʃ]. The behavioral

feedback alteration experiment consisted of five ten minute blocks. In every block, each word-condition pair was presented 20 times resulting in a total of 400 trials. The experiment started

with a baseline block in which auditory feedback was presented unaltered in all four conditions. Afterwards, feedback alterations were introduced gradually in steps of 5% relative to speech

production over 40 trials (ramp_early/ramp_late). Auditory feedback in the control condition remained unaltered throughout the whole experiment. The perturbation was kept at a maximum (20%

relative to production) for another 20 trials per feedback presentation mode (hold) until feedback was returned to normal (no alteration) in all conditions for another 20 trials per

condition (after effect). Following the adaptation task, participants were asked whether they noticed something special during speaking. Five participants noticed that sometimes auditory

feedback seemed to differ between the right and the left ear. Another four participants noticed that the words sometimes sounded different. No participant identified the type of auditory

speech feedback perturbation or noticed a change in the way he/she spoke. Data analysis and statistics are reported below. FMRI EXPERIMENT Twenty-two participants experienced a temporal

perturbation of their auditory speech feedback, the other 22 participants were studied during a spectral perturbation of their auditory speech feedback. The experiment started with normal

speaking without auditory speech feedback perturbations to ensure that differences between resting-state measurements (see below) were not driven by prolonged exposure to scanner noise or

adapting to hear the own voice via headphones (preperturbation baseline). In addition, this run’s behavioral data were used as baseline values and served for normalization of behavioral data

during the feedback perturbation run. Resting-state scans (7 min each) were acquired to assess changes in functional connectivity due to the learning of new auditory-motor speech

associations in the spectral or temporal domain. The resting-state scans were acquired before (preadaptation) and after speaking with altered speech feedback (postadaptation). Participants

were instructed to have their eyes open during the measurement and to fixate a white cross on black background in the middle of a screen. In between the resting-state scans participants

performed the feedback perturbation run. Participants read out loud three different, visually presented CVC-pseudowords ([bɪʃ], [dɪʃ], and [gɪʃ]) while they heard their own voice (altered or

unaltered) mixed with white noise through headphones8,17,18. Each pseudoword was associated with one of three experimental conditions (no perturbation, vowel perturbation, consonant

perturbation). This allowed to delineate feedback control processes during speech production from other speech production processes and to generalize results over vowels and consonants. The

rationale for choosing these three syllables was the same as for the behavioral experiment. Yet, the same vowel and fricative was used in all syllables to reduce acoustic variability of the

speech token even further. Only the initial plosive served as contextual cue to learn multiple sensory-motor associations in parallel56. Participants’ speech was recorded with an

MR-compatible microphone (FOMRI-IIITM noise cancelling microphone, Optoacoustics) and fed back via OptoActive™ active noise cancelling headphones (Optoacoustics). The level of auditory

feedback provided by the headphones was amplified to reduce the influence of unaltered bone conducted auditory feedback resulting in 90 dB headphone output. In contrast to the behavioral

study, auditory feedback perturbations were kept constant at 20% relative to production throughout the whole fMRI run59,60. The three CVC-pseudowords were presented in randomized order, 30

times each. The presentation of syllables was interspersed (one quarter of trials) with the presentation of a nonspeech condition where participants should remain silent, saw the letter

string “yyyy”, and heard white noise9,25. To allow participants to speak in relative silence and to reduce movement artifacts we used an event-related sparse sampling technique9,25. Each

trial started with the 2 s long acquisition of one functional image. Image acquisition was followed by a pause of 0.5–1.5 s after which the CVC pseudoword or the nonspeech stimulus was

visually presented for 2 s. After another pause of 2.5–3.5 s, the next image was acquired resulting in a total trial length of 8 s. The jittered acquisition delay accounted for variability

in the timing of the BOLD response depending on participant and brain region61. The jitter was chosen in such a way to sample BOLD responses around their estimated peaks 4–7 s after speech

onset. Participants were instructed to speak with normal conversational loudness and duration. In case participants did not speak loud enough to detect vowel onsets in their utterance, a

prompt to speak louder was displayed. Participants were naïve to the fact that their auditory feedback would be altered throughout the experiment. Before the actual start of the experiment

(but already inside the scanner), participants were trained to get familiarized with the experimental setup, particularly the way their own (unaltered) voice sounds via headphones. DATA

ACQUISITION Microphone input and headphone output was digitally sampled at 48 kHz and recorded at 16 kHz. Scanning was performed using a Siemens (Erlangen, Germany) Trio 3 Tesla magnetic

resonance scanner with a commercial eight-channel coil. High-resolution T1-weighted anatomical scans (TR = 1.9 s; TE = 3.04 ms; flip angle = 9°; 192 slices per slab; 1 mm3 isotropic voxel

size) were obtained to improve spatial normalization of functional images onto the Montreal Neurological Institute (MNI) brain template. Functional images were obtained with a gradient-echo

T2*-weighted transverse echoplanar image (EPI) sequence (Task-fMRI (122 volumes; TR = 2 s; TE = 30 ms; silent gap = 6 s; flip angle = 90°; 32 axial slices; 3 mm3 isotropic voxel size),

Resting-State (178 volumes; TR = 2 s; TE = 30 ms; flip angle = 90°; 30 axial slices; 3 mm3 isotropic voxel size)). AUDITORY FEEDBACK PERTURBATIONS All real-time tracking and perturbing was

performed using the Matlab Mex-based digital signal processing software package Audapter23. To enable presentation of altered and unaltered auditory feedback in parallel in the two dichotic

conditions, an additional temporary buffer was introduced into Audapter to duplicate the incoming audio signal before it was downsampled and altered. While a duplicate of the incoming signal

was processed to introduce a feedback alteration, the original signal was held in a temporary buffer and transferred unaltered to the output. To compensate for the delay of the signal

processing algorithm, the unaltered, original signal was delayed the same amount as the altered signal. The altered output was transferred to one output channel while the unaltered but also

delayed signal was transferred to the other one. The online status tracking function of Audapter was used to restrict feedback perturbations to either the vowel or the fricative in the

syllable. Vowel onsets were tracked by an empirically defined root-mean-squared intensity threshold. Fricative onset was defined as the time point when the ratio of spectral intensity in

high vs. low frequency bands crossed an empirically defined threshold for more than 0.02 s. The vowel /ɪ/ was perturbed spectrally by increasing its F1 up to 20% relative to production

(fixed formant perturbation method in Audapter). F1 is perceptually relevant to distinguish vowel sounds from each other and correlates positively with tongue height and mouth openness

during articulation47. Perceptually, the spectral vowel alteration of /ɪ/ resulted in an acoustic signal closer to the vowel /e/. The formant shift procedure introduced a temporal 11 ms

delay. The consonant /_∫/_ was perturbed spectrally by increasing its spectral centroid (amplitude-weighted mean frequency of a speech spectrum) in the acoustic output up to 20% relative to

production. The spectral centroid is an important characteristic to distinguish the two fricatives /s/ and /_∫/_ at a perceptual level and correlates with the place of articulation47, i.e.,

it is higher for the alveolar place of articulation than the postalveolar fricative. Thus, the spectral alteration of the /_∫/-_sound resulted in an acoustical signal whose spectral centroid

was closer to the fricative /s/. The Audapter algorithm for changing the spectral centroid shifts the whole-frequency spectrum and thus also its amplitude-weighted mean. The spectral

centroid alteration introduced a 24 ms delay between input and output (see Supplementary Fig. 2 for an illustration of the spectral perturbations). Temporal perturbations either increased

the length of the vowel or fricative in the acoustic output. The vowel and consonant were perturbed temporally by increasing their length up to 20% relative to production. Vowel and

consonant length was increased by time warping in the frequency domain. The time warping event was configured such that time dilation spanned the whole vowel/consonant. The average

vowel/consonant length was estimated based on vowel/consonant productions in the training phase. The rate of the catch-up period was set to 2 resulting in a natural sounding acoustic output.

In total, the time-warp perturbation introduced a 24 ms delay between input and output (see Supplementary Fig. 2 for an illustration of the temporal perturbations). BEHAVIORAL DATA ANALYSIS

Vowel- and consonant boundaries were marked manually according to the recording’s speech waveform and a broadband spectrogram (window size 5 ms) in PRAAT62. Onsets and offsets of vowels and

fricatives were labelled from an in-house annotator. CVC-productions of participants, who experienced a temporal feedback perturbation, were additionally labelled by an external annotator

who was blinded with respect to the experimental procedure and speech feedback perturbation. This was important to ensure that changes in length estimates were not unconsciously influenced

by knowledge and expectations about experimental alterations. Analyses on length estimates were therefore only performed on the data labelled by the external annotator. Inter-rater agreement

with the in-house annotator was good (ICC(2, 1)Consonant = 0.83, [0.79–0.859]; ICC(2, 1)vowel = 0.869, [0.847–0.886]). F1 estimates of the vowel, COG estimates of the consonant and relative

vowel and consonant lengths for each CVC-production were extracted in PRAAT. The average F1 value of an utterance was calculated in a time window of 40–80% of the vowel duration63 using the

burg algorithm. The maximum frequency of formants for female speakers was set to 5500 Hz and for male speakers to 5000 Hz. The spectral centroid was calculated in a time window of 40–80% of

the fricative duration. The signal was high-pass filtered at 1000 Hz before spectral centroid calculation64. The spectral centroid was calculated as the weighted mean of a frequency

spectrum obtained by a Fast Fourier transform. Before spectral centroid calculation, the spectrum was cepstrally smoothed with 500 Hz to reduce the influence of spectral outliers on spectral

centroid estimates65. To assess vowel and fricative length changes, we calculated relative segment lengths to account for different speaking rates between trials. The relative vowel length

was calculated by subtracting vowel onset from consonant onset and dividing this duration by the whole-word length. The relative consonant length was calculated by dividing the consonant

length by word duration. All trials in which the perturbed speech parameter deviated more than ±2 standard deviations from its mean within a block were discarded. F1, COG and relative length

estimates were rendered comparable across alterations, stimuli and speaker by a normalization procedure that divided each produced speech feature with its average production during

preperturbation baseline. This results in comparable values of relative production changes that were used for statistics, while the raw values are in different units (Hz and ms). Linear

mixed effects models (LMM) were used to test whether participants changed their produced speech features in response to spectral and temporal feedback perturbations. We modelled binaural and

dichotic conditions separately. The first LMM on binaural data served to check whether the spectral and temporal perturbations induced compensatory responses. Specifically, we tested

whether participants changed articulation over the course of the experiment, whether the type and/or the target phoneme of the feedback perturbation modulated compensation, and whether

compensation was greater compared to a control condition with normal auditory speech feedback. To this end we entered block (ramp early/ramp late/hold/after effect phase), feedback

alteration (altered/unaltered), type (spectral/temporal), and target of the feedback alteration (vowel/consonant) as fixed effects into the model and allowed by-subject random slopes for the

effect of block and feedback alteration. Due to the normalization procedure, compensation in relation to preperturbation values was assessed by comparing marginal estimated model means of

perturbed productions against 1, separately for the spectral and temporal groups, using two-sided, paired _t_-tests. We additionally assured that the effects were equally observed when data

were modelled separately for the spectral and temporal groups (two separate models with identical factors, see above). The data from the dichotic conditions were investigated in another LMM

that tested whether the produced speech features in response to dichotically presented spectral and temporal auditory feedback perturbations depended on which ear received the perturbed

auditory feedback, the central research question in this experiment. In this model, block (ramp early/ramp late/hold/after effect phase), ear (left/right), type (spectral/temporal), and

target of the auditory feedback alteration (vowel/consonant) were entered as fixed effects. We allowed by-subject random slopes for the effect of ear and block. The binaural control

condition was not entered into this model because functional lateralization was assessed by contrasting data from the dichotic conditions directly with each other. Significant type × ear

interactions were followed up by planned comparisons averaging overall blocks following the baseline. Planned comparisons tested whether production changes in response to spectral or

temporal feedback alteration differed significantly between the two dichotic conditions with spectral alterations showing a left ear/right hemisphere advantage and temporal alterations a

right ear/left hemisphere advantage (one-sided, paired _t_-tests on marginal estimated means). The behavioral data of the fMRI study were analyzed with an LMM analogous to the first LMM in

the behavioral experiment and allowed testing whether produced speech features with altered auditory feedback were significantly different from baseline productions and a control condition

in the same experimental run. The model contained type of feedback alteration (spectral/temporal), feedback alteration (altered/unaltered), and the target of the feedback alteration

(vowel/consonant) as fixed effects and allowed by-subject random slopes for the feedback alteration and target of the feedback alteration. Comparisons with baseline productions were again

investigated by testing whether marginal estimated means for production changes in response to spectral and temporal perturbations differed significantly from 1 (two-sided, paired _t_-test).

_P_-values were provided by the Satterthwaite’s degrees of freedom method. Linear mixed effects models, planned comparisons and post-hocs on estimated marginal means were performed with the

_afex_66 package (version 0.20-2) in R version 3.5.3. IMAGING DATA ANALYSIS Image processing and statistical analysis was performed using SPM 1267. All results were visualized using

MRIcron68. Imaging data are available at https://identifiers.org/neurovault.collection:7569. The spatial preprocessing pipeline used standard SPM 12 parameters complemented by additional

steps to account for possible motion due to speaking. The pipeline encompassed the following steps: (1) Realignment of functional images using rigid body transformation, (2) coregistration

of subject’s individual structural scans with the mean functional image of the realignment step, (3) smoothing of images with an isotropic 4 mm full-width at half-maximum Gaussian kernel to

prepare images for additional motion adjustment with Art Repair69, (4) motion adjustment of functional images with ArtRepair to reduce interpolation errors from the realignment step, (5)

normalization of functional images to a symmetric brain template via parameters from segmentation of structural scans, and (6) another smoothing of images with an isotropic 7 mm full-width

at half-maximum Gaussian kernel. The symmetric brain template was created by averaging the standard Montreal Neurological Institute (MNI) brain template within the Talairach and Tournoux

reference frame with its R/L flipped version29. The preprocessed functional images were analyzed within the framework of general linear models (GLM) adapted for nonspherical distributed

error terms. The GLM contained three regressors of interest, modelling the three auditory feedback conditions (no perturbation, vowel perturbation, consonant perturbation). Due to the

additional motion adjustment step during preprocessing movement-related effects were not modelled additionally69. Condition-specific regressors were obtained by convoluting the onset and

duration of conditions (modelled by boxcar functions) with the canonical hemodynamic response function. To account for the use of a sparse sampling protocol, we adjusted microtime resolution

and onset (SPM.T = 64, SPM.T0 = 8 s). The model was high-pass filtered with a cutoff at 128 s to remove low frequency drifts. An autoregressive model AR(1) was used to account for serial

autocorrelations in the time series. After model estimation, two contrasts were specified testing the effect of speaking with altered auditory feedback (vowel or consonant perturbation)

against normal speaking without perturbation in each individual (first-level). The resulting contrast images were subjected to second level random effect analysis to infer brain activation

at the population level. Data from the spectral and temporal perturbation groups were analyzed separately in two repeated measure ANOVAs. To exclude the possibility that any feedback

control-related effects were due to differences in the processing of vowels or consonants we only investigated effects that were consistent across feedback perturbations of the vowel and

consonant. Thus, we investigated conjunctions across both contrasts testing the global null hypothesis70 to identify spectral/temporal feedback control regions. The global null hypothesis

reveals all those brain areas that are consistently activated throughout both conditions and jointly significant. Given the common practice to increase statistical power in sparse sampling

fMRI experiments via region-of interest (ROI) analyses9,14,71, we investigated activity differences between conditions within a small volume restricted search space that spanned

literature-based ROIs for auditory feedback control. An auditory and a frontal ROI was defined a priori based on 10 mm spheres centered on previously reported functional activation maxima

for speaking with altered auditory feedback compared to normal speaking in a random effects whole brain analysis25 (Supplementary Table 1 and illustrated as lighter cortex in Fig. 3).

Because this study focuses on the contribution of both hemispheres to feedback control, we included homotopic regions of the reported activation maxima into the ROIs. Family-wise error

correction was performed at _p_ < 0.05 at the cluster level with a cluster defining threshold of _p_ < 0.001 small volume corrected in aforementioned auditory and frontal ROIs.

Activation coordinates are given in MNI space. Lateralization was first assessed using weighted bootstrapped lateralization indices (LI) with the LI toolbox in SPM26. LIs were calculated on

the SPMs representing activity associated with spectral and temporal feedback control in the auditory and ventral frontal ROI. Weighted LIs with a negative sign indicate left-lateralized

activity while weighted LIs with a positive sign indicate right-lateralized activity. While weighted bootstrapped LIs provide a robust and threshold-free method to assess lateralization26

they lack spatial sensitivity. We thus also calculated voxel-wise laterality maps by flipping feedback-related first-level contrast images (speaking with altered feedback > normal

speaking) along the interhemispheric fissure and subtracting these flipped mirror images from the original (unflipped) contrast images. The obtained spectral and temporal laterality maps

were subjected to two additional repeated measurement ANOVAs. Again, we tested the conjunction over vowel and consonant contrasts to asses which voxels showed higher activity in one

hemisphere compared to the other during spectral or temporal feedback control. Family-wise error correction was performed at _p_ < 0.05 at the cluster level with a cluster defining

threshold of _p_ < 0.001, small volume corrected in the auditory and frontal ROIs. Activation coordinates are given in MNI space. The relationship between participants’ individual degree

of compensation to spectral and temporal feedback perturbations and individual activity in spectral and temporal feedback control areas was assessed using Pearson’s correlations. In analogy

to the aforementioned fMRI analyses we did not dissociate vowel and consonant effects and correlated averaged vowel and consonant productions with averaged activity during vowel and

consonant perturbations. Similar to the fMRI compensation contrast, degree of compensation was defined as the difference between speech features during perturbation and speech features

during the control condition in the same run. Feedback control regions were defined post-hoc according to functional peak activations in the contrast speaking with spectrally or temporally

altered auditory feedback compared to normal speaking. They consisted of 6 mm spheres centered on local peak activation maxima for spectral (bilateral posterior STS and right IFG

triangularis) or temporal (bilateral anterior STS, left IFG triangularis and orbitalis) feedback control (Fig. 3b Table 1). Correlations were tested at _p_ < 0.05, uncorrected for

multiple comparisons. Functional connectivity at rest was analyzed with the Conn toolbox72. Images were spatially preprocessed with the same preprocessing pipeline described above. In

addition, time-series were denoised to reduce the impact of physiological noise and motion on results. Physiological noise was removed with the anatomical component-based noise correction

method (aCompCor) and 16 orthogonal time-courses in subject-specific WM and CSF ROIs72. Further, subject-specific motion parameters and their first derivative (scan-to-scan motion),

task-effects and subject-specific time points identified as outliers (scan-to-scan global signal change >9 and movement more than 2 mm) were regressed out. To isolate low frequency

fluctuations, resting-state data were bandpass filtered (0.008–0.09 Hz)72. For each participant and each resting-state run (pre- and postadaptation) seed-to-voxel connectivity maps were

generated by calculating bivariate correlations between the average seed time-series and the whole brain. Seeds for the connectivity analysis were the same 6 mm spheres centered on local

peak activations of the spectral and temporal feedback control contrast that served for correlation analyses with degree of compensation. The second level GLM contained two regressors

representing changes in connectivity between resting-state runs (one for the spectral group and another for the temporal group) and four parametric regressors that represented the

subject-specific amount of compensation for spectral or temporal perturbations of the vowel and consonant, separately. The parametric regressors were included to identify connectivity

changes between resting-state runs that were associated with motor learning of the new spectral and temporal auditory-motor associations due to feedback control. Connectivity changes between

resting-state runs at the average level of compensation were captured by the first two regressors and not analyzed further. With this model we assessed motor learning-related connectivity

changes by means of conjunction analyses over vowel and consonant regressors (e.g., post/preadaptation difference that correlates with F1 compensation ∩ post/preadaptation difference that

correlates with COG compensation). SPMs were thresholded at _p_ < 0.05 FWE corrected at the cluster level with a cluster defining threshold of _p_ < 0.001. REPORTING SUMMARY Further

information on research design is available in the Nature Research Reporting Summary linked to this article. DATA AVAILABILITY The unthresholded statistical parametric maps that support the

findings of this study have been deposited at https://neurovault.org with the access code https://identifiers.org/neurovault.collection:7569. The source data underlying Figs. 1, 2, 3a, 3c

and supplementary Fig. 1 are provided as a Source Data file. Source data are provided with this paper. CODE AVAILABILITY All analyses were performed using Matlab R2012b and R version 3.5.2,

with standard functions and toolboxes (see Methods). All code is available upon request. Source data are provided with this paper. REFERENCES * Tourville, J. A. & Guenther, F. H. The

DIVA model. A neural theory of speech acquisition and production. _Lang. Cogn. Process._ 26, 952–981 (2011). Article PubMed Google Scholar * Hickok, G. The cortical organization of speech

processing. Feedback control and predictive coding the context of a dual-stream model. _J. Commun. Disord._ 45, 393–402 (2012). Article PubMed PubMed Central Google Scholar * Houde, J.

F. & Nagarajan, S. S. Speech production as state feedback control. _Front. Hum. Neurosci._ 5, 82 (2011). Article PubMed PubMed Central Google Scholar * Guenther, F. H. & Hickok,

G. in _Neurobiology of Language_ (Elsevier, 2016), pp. 725–740. * Hickok, G. Computational neuroanatomy of speech production. _Nat. Rev. Neurosci._ 13, 135–145 (2012). Article CAS PubMed

PubMed Central Google Scholar * Perkell, J., Matthies, M., Lane, H. & Guenther, F. R. Speech motor control: acoustic goals, saturation feedback and internal models. _Speech Commun._

22, 227–250 (1997). Article Google Scholar * Villacorta, V. M., Perkell, J. S. & Guenther, F. H. Sensorimotor adaptation to feedback perturbations of vowel acoustics and its relation

to perception. _J. Acoust. Soc. Am._ 122, 2306–2319 (2007). Article ADS PubMed Google Scholar * Mitsuya, T., MacDonald, E. N. & Munhall, K. G. Temporal control and compensation for

perturbed voicing feedback. _J. Acoust. Soc. Am._ 135, 2986–2994 (2014). Article ADS PubMed PubMed Central Google Scholar * Tourville, J. A., Reilly, K. J. & Guenther, F. H. Neural

mechanisms underlying auditory feedback control of speech. _NeuroImage_ 39, 1429–1443 (2008). Article PubMed Google Scholar * Flagmeier, S. G. et al. The neural changes in connectivity of

the voice network during voice pitch perturbation. _Brain Lang._ 132, 7–13 (2014). Article PubMed PubMed Central Google Scholar * Toyomura, A. et al. Neural correlates of auditory

feedback control in human. _Neuroscience_ 146, 499–503 (2007). Article CAS PubMed Google Scholar * Behroozmand, R. & Sangtian, S. Neural bases of sensorimotor adaptation in the vocal

motor system. _Exp. Brain Res._ 236, 1881–1895 (2018). Article PubMed Google Scholar * Kort, N., Nagarajan, S. S. & Houde, J. F. A right-lateralized cortical network drives error

correction to voice pitch feedback perturbation. _J. Acoust. Soc. Am._ 134, 4234 (2013). Article ADS Google Scholar * Behroozmand, R. et al. Sensory-motor networks involved in speech

production and motor control. An fMRI study. _NeuroImage_ 109, 418–428 (2015). Article PubMed PubMed Central Google Scholar * Hickok, G., Houde, J. & Rong, F. Sensorimotor

integration in speech processing. _Comput. Basis Neural Organ. Neuron_ 69, 407–422 (2011). CAS Google Scholar * Kell, C. A. et al. Phonetic detail and lateralization of reading-related

inner speech and of auditory and somatosensory feedback processing during overt reading. _Hum. Brain Mapp._ 38, 493–508 (2017). Article PubMed Google Scholar * Jones, J. A. & Munhall,

K. G. Remapping auditory-motor representations in voice production. _Curr. Biol._ 15, 1768–1772 (2005). Article CAS PubMed Google Scholar * Shiller, D. M., Sato, M., Gracco, V. L. &

Baum, S. R. Perceptual recalibration of speech sounds following speech motor learning. _J. Acoust. Soc. Am._ 125, 1103–1113 (2009). Article ADS PubMed Google Scholar * Cai, S., Ghosh,

S. S., Guenther, F. H. & Perkell, J. S. Focal manipulations of formant trajectories reveal a role of auditory feedback in the online control of both within-syllable and between-syllable

speech timing. _J. Neurosci._ 31, 16483–16490 (2011). Article CAS PubMed PubMed Central Google Scholar * Zatorre, R. J., Belin, P. & Penhune, V. B. Structure and function of

auditory cortex: music and speech. _Trends Cogn. Sci._ 6, 37–46 (2002). Article PubMed Google Scholar * Flinker, A., Doyle, W. K., Mehta, A. D., Devinsky, O. & Poeppel, D.

Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries. _Nat. Hum. Behav._ 3, 393–405 (2019). Article PubMed PubMed Central Google Scholar * Cutting,

J. E. Two left-hemisphere mechanisms in speech perception. _Percept. Psychophys._ 16, 601–612 (1974). Article Google Scholar * Tourville, J. A., Cai, S. & Guenther, F. H. Exploring

auditory-motor interactions in normal and disordered speech. _J. Acoust. Soc. Am._ 133, 3564–3564 (2013). Article ADS Google Scholar * Kimura, D. Functional asymmetry of the brain in

dichotic listening. _Cortex_ 3, 163–178 (1967). Article Google Scholar * Niziolek, C. A. & Guenther, F. H. Vowel category boundaries enhance cortical and behavioral responses to speech

feedback alterations. _J. Neurosci._ 33, 12090–12098 (2013). Article CAS PubMed PubMed Central Google Scholar * Wilke, M. & Schmithorst, V. J. A combined bootstrap/histogram

analysis approach for computing a lateralization index from neuroimaging data. _NeuroImage_ 33, 522–530 (2006). Article PubMed Google Scholar * Seghier, M. L., Kherif, F., Josse, G. &

Price, C. J. Regional and hemispheric determinants of language laterality. Implications for preoperative fMRI. _Hum. Brain Mapp._ 32, 1602–1614 (2011). Article PubMed Google Scholar *

Kell, C. A., Morillon, B., Kouneiher, F. & Giraud, A.-L. Lateralization of speech production starts in sensory cortices-a possible sensory origin of cerebral left dominance for speech.

_Cereb. Cortex_ 21, 932–937 (2011). Article PubMed Google Scholar * Keller, C. & Kell, C. A. Asymmetric intra- and interhemispheric interactions during covert and overt sentence

reading. _Neuropsychologia_ 93, 448–465 (2016). Article PubMed Google Scholar * Cogan, G. B. et al. Sensory-motor transformations for speech occur bilaterally. _Nature_ 507, 94–98 (2014).

Article ADS CAS PubMed PubMed Central Google Scholar * Stephan, K. E., Fink, G. R. & Marshall, J. C. Mechanisms of hemispheric specialization. Insights from analyses of

connectivity. _Neuropsychologia_ 45, 209–228 (2007). Article PubMed PubMed Central Google Scholar * Agnew, Z. K., McGettigan, C., Banks, B. & Scott, S. K. Articulatory movements

modulate auditory responses to speech. _NeuroImage_ 73, 191–199 (2013). Article CAS PubMed PubMed Central Google Scholar * Golfinopoulos, E. et al. fMRI investigation of unexpected

somatosensory feedback perturbation during speech. _NeuroImage_ 55, 1324–1338 (2011). Article PubMed Google Scholar * Gompf, F., Pflug, A., Laufs, H. & Kell, C. A. Non-linear

relationship between BOLD activation and amplitude of beta oscillations in the supplementary motor area during rhythmic finger tapping and internal timing. _Front. Hum. Neurosci._ 11, 582

(2017). Article PubMed PubMed Central Google Scholar * Cadena-Valencia, J., García-Garibay, O., Merchant, H., Jazayeri, M. & Lafuente de, V. Entrainment and maintenance of an

internal metronome in supplementary motor area. _eLife_ https://doi.org/10.7554/eLife.38983 (2018). * Pecenka, N., Engel, A. & Keller, P. E. Neural correlates of auditory temporal

predictions during sensorimotor synchronization. _Front. Hum. Neurosci._ 7, 380 (2013). Article PubMed PubMed Central Google Scholar * Teghil, A. et al. Neural substrates of

internally-based and externally-cued timing. An activation likelihood estimation (ALE) meta-analysis of fMRI studies. _Neurosci. Biobehav. Rev._ 96, 197–209 (2019). Article PubMed Google

Scholar * Pflug, A., Gompf, F., Muthuraman, M., Groppa, S. & Kell, C. A. Differential contributions of the two human cerebral hemispheres to action timing. _eLife_

https://doi.org/10.7554/eLife.48404 (2019). * Poeppel, D. The analysis of speech in different temporal integration windows. Cerebral lateralization as ‘asymmetric sampling in time’. _Speech

Commun._ 41, 245–255 (2003). Article Google Scholar * Ivry, R. & Robertson, L. C. _The Two Sides of Perception_ (MIT Press, Cambridge, 1998). * Schönwiesner, M. & Zatorre, R. J.

Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI. _Proc. Natl Acad. Sci. USA_ 106, 14611–14616 (2009). Article

ADS PubMed Google Scholar * Norman-Haignere, S., Kanwisher, N. G. & McDermott, J. H. Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition.

_Neuron_ 88, 1281–1296 (2015). Article CAS PubMed PubMed Central Google Scholar * Albouy, P., Benjamin, L., Morillon, B. & Zatorre, R. J. Distinct sensitivity to spectrotemporal

modulation supports brain asymmetry for speech and melody. _Science_ 367, 1043–1047 (2020). Article ADS CAS PubMed Google Scholar * Keitel, A., Gross, J. & Kayser, C. Perceptually

relevant speech tracking in auditory and motor cortex reflects distinct linguistic features. _PLoS Biol._ 16, e2004473 (2018). Article PubMed PubMed Central CAS Google Scholar *

Fletcher, J. in _The Handbook of Phonetic Sciences_ (eds Hardcastle, W. J., Laver, J. & Gibbon, F. E.) Vol. 2, pp. 523–602 (Wiley-Blackwell, Chichester West Sussex U.K., 2010). * Byrd,

D., Krivokapić, J. & Lee, S. How far, how long. On the temporal scope of prosodic boundary effects. _J. Acoust. Soc. Am._ 120, 1589–1599 (2006). Article ADS PubMed PubMed Central

Google Scholar * Stevens, K. N. _Acoustic Phonetics_ (MIT Press, 2000). * Pisanski, K. & Bryant, G. A. in _The Oxford Handbook of Voice Studies_ (eds Eidsheim, N. S. & Meizel, K.)

pp. 269–306 (Oxford University Press, New York NY, 2019). * Weirich, M. & Simpson, A. P. Gender identity is indexed and perceived in speech. _PloS ONE_ 13, e0209226 (2018). Article

PubMed PubMed Central Google Scholar * Dilley, L. C., Wieland, E. A., Gamache, J. L., McAuley, J. D. & Redford, M. A. Age-related changes to spectral voice characteristics affect

judgments of prosodic, segmental, and talker attributes for child and adult speech. _J. Speech, Lang. Hear. Res._ 56, 159–177 (2013). Article Google Scholar * Klein, E., Brunner, J. &

Hoole, P. in _Speech Production and Perception_ (eds Fuchs, S., Cleland, J. & Rochet-Capellan, A.) (Peter Lang, New York, 2019). * Parrell, B., Agnew, Z., Nagarajan, S., Houde, J. &

Ivry, R. B. Impaired feedforward control and enhanced feedback control of speech in patients with cerebellar degeneration. _J. Neurosci._ 37, 9249–9258 (2017). Article CAS PubMed PubMed

Central Google Scholar * Scott, S. K. & McGettigan, C. Do temporal processes underlie left hemisphere dominance in speech perception? _Brain Lang._ 127, 36–45 (2013). Article PubMed

PubMed Central Google Scholar * Hugdahl, K. Lateralization of cognitive processes in the brain. _Acta Psychol._ 105, 211–235 (2000). Article CAS Google Scholar * Oldfield, R. C. The

assessment and analysis of handedness: the Edingburgh inventory. _Neuropsychologia_ 9, 97–113 (1971). Article CAS PubMed Google Scholar * Rochet-Capellan, A. & Ostry, D. J.

Simultaneous acquisition of multiple auditory-motor transformations in speech. _J. Neurosci._ 31, 2657–2662 (2011). Article CAS PubMed PubMed Central Google Scholar * Pompino-Marshall,

B. & Zygis, M. in _Papers from the Linguistics Laboratory_ (eds Weirich, M. & Jannedy, S.) Vol. 52, pp. 1–17 (ZAS, Berlin, 2010). * Mann, V. & Soli, S. D. Perceptual order and

the effect of vocalic context of fricative perception. _Percept. Psychophys._ 49, 399–411 (1991). Article CAS PubMed Google Scholar * Franken, M. K., Acheson, D. J., McQueen, J. M.,

Hagoort, P. & Eisner, F. Consistency influences altered auditory feedback processing. _Q. J. Exp. Psychol._ 72, 2371–2379 (2019). Article Google Scholar * Ogane, R. & Honda, M.

Speech compensation for time-scale-modified auditory feedback. _J. Speech Lang. Hear. Res._ 57, 616–625 (2014). Article Google Scholar * Peelle, J. E. Methodological challenges and

solutions in auditory functional magnetic resonance imaging. _Front. Neurosci._ 8, 253 (2014). Article PubMed PubMed Central Google Scholar * Boersma, P. Praat, a system for doing

phonetics by computer. _Glot Int._ 5, 341–345 (2001). Google Scholar * Mitsuya, T., MacDonald, E. N., Munhall, K. G. & Purcell, D. W. Formant compensation for auditory feedback with

English vowels. _J. Acoust. Soc. Am._ 138, 413–424 (2015). Article ADS PubMed Google Scholar * Fuchs, S., Toda, M. & Żygis, M. _Turbulent Sounds. An Interdisciplinary Guide_. (Mouton

de Gruyter, Berlin, 2010). * Breithaupt, C., Gerkman, T. & Martin, R. Cepstral smoothing of spectral filter gains for speech enhancement without musical noise. _IEEE Signal Process.

Lett._ 14, 1036–1039 (2007). Article ADS Google Scholar * Singmann, H., Bolker, B., Westfall, J. & Aust, F. _afex: Analysis of Factorial Experiments_ (2018). * Welcome Trust Centre

for Neuroimaging. _SPM 12_ (Welcome Trust Centre for Neuroimaging, London, 2012). * Rorden, C., Karnath, H.-O. & Bonilha, L. Improving lesion-symptom mapping. _J. Cogn. Neurosci._ 19,

1081–1088 (2007). Article PubMed Google Scholar * Mazaika, P., Hoeft, F., Glover, G. H. & Reiss, A. R. Methods and software for fMRI analysis for clinical subjects. _NeuroImage_ 47,

S58 (2009). Article Google Scholar * Friston, K. J., Penny, W. D. & Glaser, D. E. Conjunction revisited. _NeuroImage_ 25, 661–667 (2005). Article PubMed Google Scholar * Kleber, B.,

Zeitouni, A. G., Friberg, A. & Zatorre, R. J. Experience-dependent modulation of feedback integration during singing. Role of the right anterior insula. _J. Neurosci._ 33, 6070–6080

(2013). Article CAS PubMed PubMed Central Google Scholar * Whitfield-Gabrieli, S. & Nieto-Castanon, A. Conn: a functional connectivity toolbox for correlated and anticorrelated

brain networks. _Brain Connect._ 2, 125–141 (2012). Article PubMed Google Scholar Download references ACKNOWLEDGEMENTS We thank Olivia Maky for technical help. This study was funded by

the German Research Foundation with an Emmy Noether Grant to CAK (KE 1514/2-1). AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Cognitive Neuroscience Group, Brain Imaging Center and

Department of Neurology, Goethe University, Schleusenweg 2-16, 60528, Frankfurt, Germany Mareike Floegel & Christian A. Kell * Leibniz-Centre General Linguistics (ZAS), Schuetzenstr. 18,

10117, Berlin, Germany Susanne Fuchs Authors * Mareike Floegel View author publications You can also search for this author inPubMed Google Scholar * Susanne Fuchs View author publications

You can also search for this author inPubMed Google Scholar * Christian A. Kell View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS M.F.,

C.A.K., and S.F. designed the experiments. M.F. collected and analyzed the data. S.F. analyzed data. M.F. and C.A.K. prepared the manuscript. S.F. edited the manuscript. C.A.K. supervised

the project and acquired funding. CORRESPONDING AUTHOR Correspondence to Christian A. Kell. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL

INFORMATION PEER REVIEW INFORMATION _Nature Communications_ thanks Anne Keitel, Caroline Niziolek and the other, anonymous, reviewer(s) for their contribution to the peer review of this

work. Peer reviewer reports are available. PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION PEER REVIEW FILE REPORTING SUMMARY SOURCE DATA SOURCE DATA RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative

Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the

original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in

the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended

use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Floegel, M., Fuchs, S. & Kell, C.A. Differential contributions of the two

cerebral hemispheres to temporal and spectral speech feedback control. _Nat Commun_ 11, 2839 (2020). https://doi.org/10.1038/s41467-020-16743-2 Download citation * Received: 30 September

2019 * Accepted: 21 May 2020 * Published: 05 June 2020 * DOI: https://doi.org/10.1038/s41467-020-16743-2 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this

content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative

Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control

Play all audios:

Trending News

Latest News