Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Aug;116(2):1168-78.
doi: 10.1121/1.1763952.

Compensation for pitch-shifted auditory feedback during the production of Mandarin tone sequences

Affiliations

Compensation for pitch-shifted auditory feedback during the production of Mandarin tone sequences

Yi Xu et al. J Acoust Soc Am. 2004 Aug.

Abstract

Recent research has found that while speaking, subjects react to perturbations in pitch of voice auditory feedback by changing their voice fundamental frequency (F0) to compensate for the perceived pitch-shift. The long response latencies (150-200 ms) suggest they may be too slow to assist in on-line control of the local pitch contour patterns associated with lexical tones on a syllable-to-syllable basis. In the present study, we introduced pitch-shifted auditory feedback to native speakers of Mandarin Chinese while they produced disyllabic sequences /ma ma/ with different tonal combinations at a natural speaking rate. Voice F0 response latencies (100-150 ms) to the pitch perturbations were shorter than syllable durations reported elsewhere. Response magnitudes increased from 50 cents during static tone to 85 cents during dynamic tone productions. Response latencies and peak times decreased in phrases involving a dynamic change in F0. The larger response magnitudes and shorter latency and peak times in tasks requiring accurate, dynamic control of F0, indicate this automatic system for regulation of voice F0 may be task-dependent. These findings suggest that auditory feedback may be used to help regulate voice F0 during production of bi-tonal Mandarin phrases.

PubMed Disclaimer

Figures

FIG. 1
FIG. 1
(a) Mean F0 contours of Mandarin syllable /ma/ spoken with four lexical tones: High (H), Rising (R), Low (L), and Falling (F). The syllables mean “mother,” “hemp,” “horse,” or “to scold,” respectively. Data averaged over 48 repetitions by eight male speakers (Xu, 1997). (b) Mean F0 curves of the Mandarin tone sequences HxRHH where x varies across H, R, L, and F (which changes the meaning of the first word from “catty” to “cat-fan,” “cat-rice,” or “cat-honey.” The vertical grids mark the syllable boundaries. The short vertical bars depict +/− one standard deviation about the mean. Data averaged across five repetitions produced by one speaker from Xu (1999).
FIG. 2
FIG. 2
(a) Averaged test wave (heavy black line) superimposed on standard error of the mean (SE) (dark gray wide line) in response to a downward pitch-shift stimulus. Control average wave (thin black line) superimposed on SE (light gray wide line). The square wave at the bottom indicates time and direction of stimulus (vertical dimension not to scale). (b) Probability (p) values resulting from a t-test comparison of test and control waves (see the text for details). The circled point is defined as response latency and the boxed point is the time of peak response magnitude. (c) The difference wave calculated by subtracting control from the test average wave.
FIG. 3
FIG. 3
Control (thin black line) and test average waves (thick black line) during H-H, H-R, and H-F sequences at the 100 ms stimulus timing condition. Heavy dashed lines are simulations produced by the model (see the text). The vertical arrow indicates time where the response magnitude was measured for this trace (see the text). Error bars represent the standard error of the mean for a single direction. The inset shows an expanded portion of average waves. Curves at the bottom indicate the time and direction of the stimulus. For all panels, the stimulus onset occurred approximately at 0.1 s. The x-axis (time) starts at 0.05 s, which is 0.05 s after vocalization onset. Note that the y-axis differs for each plot.
FIG. 4
FIG. 4
Control (thin black line) and test average waves (thick black line) during H-H, H-R, and H-F sequences at the 250 ms stimulus timing condition. Error bars represent standard error of the mean for a single direction. H-R, “#” marks large difference between control and test waves mentioned in text. H-F, “*s” indicate rise in F0 prior to major drop (see the text). Heavy dashed lines are simulations produced by model (see the text). Stimulus onset began at 0.25 s following vocal onset. In all illustrated examples, differences between control and test averages were statistically significant. The x-axis (time) starts at 0.1 s, which is 0.1 s after vocalization onset. Note that the y-axis differs for each plot. See the legend of Fig. 3 for further details. All traces for Fig. 3 and 4 were taken from the same subject.
FIG. 5
FIG. 5
Mathematical model of pitch stabilization. On the left side, Desired F0 is input. Corrections are added at the summing junction at the center, bottom to produce F0. Corrections are computed by comparing perceived F0 (the upper right hand part of the diagram) with Expected F0. Perceived F0 is delayed by 130 ms with respect to F0 reflecting delays in registration and production of sound. Expected F0 is also delayed by 130 ms so that both signals are in the same time frame. The difference between Expected F0 and Perceived F0, Error, is filtered and used to adjust the F0 signal.

References

    1. Abbs JH, Gracco VL. “Control of complex motor gestures: Orofacial muscle responses to load perturbations of lip during speech,”. J Neurophysiol. 1984;51:705–723. - PubMed
    1. Alipour-Haghighi F, Titze IR, Durham P. “Twitch response in the canine vocalis muscle,”. J Speech Hear Res. 1987;30:290–294. - PubMed
    1. Brooke JD, Collins DF, Boucher S, McIlroy WE. “Modulation of human short latency reflexes between standing and walking,”. Brain Res. 1991;548:172–178. - PubMed
    1. Burnett TA, Freedland MB, Larson CR, Hain TC. “Voice f0 responses to manipulations in pitch feedback,”. J Acoust Soc Am. 1998;103:3153–3161. - PubMed
    1. Burnett TA, Larson CR. “Early pitch shift response is active in both steady and dynamic voice pitch control,”. J Acoust Soc Am. 2002;112:1058–1063. - PubMed

Publication types