Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 28;12(7):e0181992.
doi: 10.1371/journal.pone.0181992. eCollection 2017.

A fast and accurate zebra finch syllable detector

Affiliations

A fast and accurate zebra finch syllable detector

Ben Pearre et al. PLoS One. .

Abstract

The song of the adult male zebra finch is strikingly stereotyped. Efforts to understand motor output, pattern generation, and learning have taken advantage of this consistency by investigating the bird's ability to modify specific parts of song under external cues, and by examining timing relationships between neural activity and vocal output. Such experiments require that precise moments during song be identified in real time as the bird sings. Various syllable-detection methods exist, but many require special hardware, software, and know-how, and details on their implementation and performance are scarce. We present an accurate, versatile, and fast syllable detector that can control hardware at precisely timed moments during zebra finch song. Many moments during song can be isolated and detected with false negative and false positive rates well under 1% and 0.005% respectively. The detector can run on a stock Mac Mini with triggering delay of less than a millisecond and a jitter of σ ≈ 2 milliseconds.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The spectrogram of the song of the bird “lny64”, used as an example throughout this paper.
This image was made by superposing the spectra of our 2818 aligned songs. Our example detection points, t1*t6*, are shown as red lines, with example recognition regions of 30 ms × 1–8 kHz marked as rectangles.
Fig 2
Fig 2. Each plot shows one network output unit’s responses to all 2818 presentations of lny64’s song shown in Fig 1.
We show only the syllables t1*, t4*, and t6*, and we do not show the non-response to presentation of non-song. The horizontal axis is time relative to the beginning of the aligned song, and the vertical axis is an index for the 2818 individual song presentations. The grey shading shows the audio amplitude of song Y at time T. Detection events on training songs are shown in cyan, with detections of unseen test songs in red. To provide an intuition of intra-song variability, songs have been stably sorted by the time of detection events; thus, each of the three detection graphs shows the songs in a different order.
Fig 3
Fig 3. Accuracy variability over 100 different training runs for each of the test detection points.
Each dot shows the test-set accuracy for an independently trained detector. Because the horizontal positions have been randomised slightly so as not to occlude same-valued measurements, test syllable is also indicated by colour. The means are given in Table 2.
Fig 4
Fig 4. Timing varies as the FFT frame interval changes.
Here we show results for the ideal detector and the LabVIEW and Swift+serial implementations, for the constructed δ-syllable and for trigger t4* of lny64’s song. The lines show latency; error bars are standard deviation (jitter). Points have been shifted horizontally slightly for clarity; original positions are [0.5 1 1.5 2 4] ms.
Fig 5
Fig 5. Timing data for lny64’s 6 test syllables, for the ideal and the Swift+serial detectors, with an FFT frame rate of 1.5 ms.
Point centres show latency; error bars show jitter.
Fig 6
Fig 6. The different detectors for the constructed δ-syllable and for lny64’s song at t4*.
Point centres show latency; error bars show jitter.
Fig 7
Fig 7. Raw timing curves for all detectors measured during detection of lny64’s t4* using 1.5-ms frames.
We extract the trigger events from each curve, from which we obtain the mean—latency—and standard deviation—jitter.

References

    1. Kao MH, Doupe AJ, Brainard MS. Contributions of an avian basal ganglia–forebrain circuit to real-time modulation of song. Letters to Nature. 2005. February;433:638–643. Available from: http://www.nature.com/nature/journal/v433/n7026/abs/nature03127.html. - PubMed
    1. Wang CZH, Herbst JA, Keller GB, Hahnloser RHR. Rapid Interhemispheric Switching during Vocal Production in a Songbird. PLOS Biology. 2008. October;6(10). Available from: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.006.... - PMC - PubMed
    1. Keller GB, Hahnloser RHR. Neural processing of auditory feedback during vocal practice in a songbird. Nature. 2009. January;457:187–190. Available from: http://www.nature.com/nature/journal/v457/n7226/abs/nature07467.html. - PubMed
    1. Tumer EC, Brainard MS. Performance variability enables adaptive plasticity of’crystallized’ adult birdsong. Nature. 2007. December;450:1240–1244. Available from: http://www.nature.com/nature/journal/v450/n7173/abs/nature06390.html. - PubMed
    1. Andalman AS, Fee MS. A basal ganglia-forebrain circuit in the songbird biases motor output to avoid vocal errors. Proceedings of the National Academy of Sciences of the United States of America. 2009. July;106(30):12518–12523. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2709669/. 10.1073/pnas.0903214106 - DOI - PMC - PubMed

LinkOut - more resources