Compensation for pitch-shifted auditory feedback during the production of Mandarin tone sequences

Yi Xu¹, Charles R Larson, Jay J Bauer, Timothy C Hain

Affiliations

PMID: 15376682
PMCID: PMC1224717
DOI: 10.1121/1.1763952

Compensation for pitch-shifted auditory feedback during the production of Mandarin tone sequences

Yi Xu et al. J Acoust Soc Am. 2004 Aug.

. 2004 Aug;116(2):1168-78.

doi: 10.1121/1.1763952.

Authors

Yi Xu¹, Charles R Larson, Jay J Bauer, Timothy C Hain

Affiliation

¹ Haskins Laboratories, New Haven, Connecticut 06511, USA. xu@haskins.yale.edu

PMID: 15376682
PMCID: PMC1224717
DOI: 10.1121/1.1763952

Abstract

Recent research has found that while speaking, subjects react to perturbations in pitch of voice auditory feedback by changing their voice fundamental frequency (F0) to compensate for the perceived pitch-shift. The long response latencies (150-200 ms) suggest they may be too slow to assist in on-line control of the local pitch contour patterns associated with lexical tones on a syllable-to-syllable basis. In the present study, we introduced pitch-shifted auditory feedback to native speakers of Mandarin Chinese while they produced disyllabic sequences /ma ma/ with different tonal combinations at a natural speaking rate. Voice F0 response latencies (100-150 ms) to the pitch perturbations were shorter than syllable durations reported elsewhere. Response magnitudes increased from 50 cents during static tone to 85 cents during dynamic tone productions. Response latencies and peak times decreased in phrases involving a dynamic change in F0. The larger response magnitudes and shorter latency and peak times in tasks requiring accurate, dynamic control of F0, indicate this automatic system for regulation of voice F0 may be task-dependent. These findings suggest that auditory feedback may be used to help regulate voice F0 during production of bi-tonal Mandarin phrases.

PubMed Disclaimer

Figures

**FIG. 1**
(a) Mean F₀ contours of Mandarin syllable /ma/ spoken with four lexical tones: High (H), Rising (R), Low (L), and Falling (F). The syllables mean “mother,” “hemp,” “horse,” or “to scold,” respectively. Data averaged over 48 repetitions by eight male speakers (Xu, 1997). (b) Mean F₀ curves of the Mandarin tone sequences HxRHH where x varies across H, R, L, and F (which changes the meaning of the first word from “catty” to “cat-fan,” “cat-rice,” or “cat-honey.” The vertical grids mark the syllable boundaries. The short vertical bars depict +/− one standard deviation about the mean. Data averaged across five repetitions produced by one speaker from Xu (1999).

**FIG. 2**
(a) Averaged test wave (heavy black line) superimposed on standard error of the mean (SE) (dark gray wide line) in response to a downward pitch-shift stimulus. Control average wave (thin black line) superimposed on SE (light gray wide line). The square wave at the bottom indicates time and direction of stimulus (vertical dimension not to scale). (b) Probability (p) values resulting from a t-test comparison of test and control waves (see the text for details). The circled point is defined as response latency and the boxed point is the time of peak response magnitude. (c) The difference wave calculated by subtracting control from the test average wave.

**FIG. 3**
Control (thin black line) and test average waves (thick black line) during H-H, H-R, and H-F sequences at the 100 ms stimulus timing condition. Heavy dashed lines are simulations produced by the model (see the text). The vertical arrow indicates time where the response magnitude was measured for this trace (see the text). Error bars represent the standard error of the mean for a single direction. The inset shows an expanded portion of average waves. Curves at the bottom indicate the time and direction of the stimulus. For all panels, the stimulus onset occurred approximately at 0.1 s. The x-axis (time) starts at 0.05 s, which is 0.05 s after vocalization onset. Note that the y-axis differs for each plot.

**FIG. 4**
Control (thin black line) and test average waves (thick black line) during H-H, H-R, and H-F sequences at the 250 ms stimulus timing condition. Error bars represent standard error of the mean for a single direction. H-R, “#” marks large difference between control and test waves mentioned in text. H-F, “*_s” indicate rise in F₀ prior to major drop (see the text). Heavy dashed lines are simulations produced by model (see the text). Stimulus onset began at 0.25 s following vocal onset. In all illustrated examples, differences between control and test averages were statistically significant. The x-axis (time) starts at 0.1 s, which is 0.1 s after vocalization onset. Note that the y-axis differs for each plot. See the legend of Fig. 3 for further details. All traces for Fig. 3 and 4 were taken from the same subject.

**FIG. 5**
Mathematical model of pitch stabilization. On the left side, *Desired F*₀ is input. Corrections are added at the summing junction at the center, bottom to produce F₀. Corrections are computed by comparing perceived F₀ (the upper right hand part of the diagram) with *Expected F*₀. *Perceived F*₀ is delayed by 130 ms with respect to F₀ reflecting delays in registration and production of sound. *Expected F*₀ is also delayed by 130 ms so that both signals are in the same time frame. The difference between *Expected F*₀ and *Perceived F*₀, *Error*, is filtered and used to adjust the F₀ signal.

See this image and copyright information in PMC

References

1. Abbs JH, Gracco VL. “Control of complex motor gestures: Orofacial muscle responses to load perturbations of lip during speech,”. J Neurophysiol. 1984;51:705–723. - PubMed
1. Alipour-Haghighi F, Titze IR, Durham P. “Twitch response in the canine vocalis muscle,”. J Speech Hear Res. 1987;30:290–294. - PubMed
1. Brooke JD, Collins DF, Boucher S, McIlroy WE. “Modulation of human short latency reflexes between standing and walking,”. Brain Res. 1991;548:172–178. - PubMed
1. Burnett TA, Freedland MB, Larson CR, Hain TC. “Voice f0 responses to manipulations in pitch feedback,”. J Acoust Soc Am. 1998;103:3153–3161. - PubMed
1. Burnett TA, Larson CR. “Early pitch shift response is active in both steady and dynamic voice pitch control,”. J Acoust Soc Am. 2002;112:1058–1063. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Compensation for pitch-shifted auditory feedback during the production of Mandarin tone sequences

Affiliation

Compensation for pitch-shifted auditory feedback during the production of Mandarin tone sequences

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources