Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 21;114(12):3085-3090.
doi: 10.1073/pnas.1618693114. Epub 2017 Mar 6.

Classroom sound can be used to classify teaching practices in college science courses

Melinda T Owens  1 Shannon B Seidel  2 Mike Wong  3 Travis E Bejines  2 Susanne Lietz  1 Joseph R Perez  2 Shangheng Sit  1 Zahur-Saleh Subedar  1 Gigi N Acker  4   5 Susan F Akana  6 Brad Balukjian  7 Hilary P Benton  1   8 J R Blair  1 Segal M Boaz  9 Katharyn E Boyer  1   10 Jason B Bram  4 Laura W Burrus  1 Dana T Byrd  1 Natalia Caporale  11 Edward J Carpenter  1   10 Yee-Hung Mark Chan  1 Lily Chen  1 Amy Chovnick  9 Diana S Chu  1 Bryan K Clarkson  12 Sara E Cooper  8 Catherine Creech  13 Karen D Crow  1 José R de la Torre  1 Wilfred F Denetclaw  1 Kathleen E Duncan  8 Amy S Edwards  8 Karen L Erickson  8 Megumi Fuse  1 Joseph J Gorga  14 Brinda Govindan  1 L Jeanette Green  15 Paul Z Hankamp  16 Holly E Harris  1 Zheng-Hui He  1 Stephen Ingalls  1 Peter D Ingmire  1   17 J Rebecca Jacobs  8 Mark Kamakea  18 Rhea R Kimpo  1   19 Jonathan D Knight  1 Sara K Krause  20 Lori E Krueger  21   22 Terrye L Light  1 Lance Lund  1 Leticia M Márquez-Magaña  1 Briana K McCarthy  23 Linda J McPheron  24 Vanessa C Miller-Sims  1 Christopher A Moffatt  1 Pamela C Muick  21   25 Paul H Nagami  1   7   26 Gloria L Nusse  1 Kristine M Okimura  27 Sally G Pasion  1 Robert Patterson  1 Pleuni S Pennings  1 Blake Riggs  1 Joseph Romeo  1 Scott W Roy  1 Tatiane Russo-Tait  28 Lisa M Schultheis  8 Lakshmikanta Sengupta  16 Rachel Small  29 Greg S Spicer  1 Jonathon H Stillman  1   10 Andrea Swei  1 Jennifer M Wade  30 Steven B Waters  23 Steven L Weinstein  1 Julia K Willsie  12 Diana W Wright  5   31 Colin D Harrison  32 Loretta A Kelley  33 Gloriana Trujillo  34 Carmen R Domingo  1 Jeffrey N Schinske  4   8 Kimberly D Tanner  35
Affiliations

Classroom sound can be used to classify teaching practices in college science courses

Melinda T Owens et al. Proc Natl Acad Sci U S A. .

Abstract

Active-learning pedagogies have been repeatedly demonstrated to produce superior learning gains with large effect sizes compared with lecture-based pedagogies. Shifting large numbers of college science, technology, engineering, and mathematics (STEM) faculty to include any active learning in their teaching may retain and more effectively educate far more students than having a few faculty completely transform their teaching, but the extent to which STEM faculty are changing their teaching methods is unclear. Here, we describe the development and application of the machine-learning-derived algorithm Decibel Analysis for Research in Teaching (DART), which can analyze thousands of hours of STEM course audio recordings quickly, with minimal costs, and without need for human observers. DART analyzes the volume and variance of classroom recordings to predict the quantity of time spent on single voice (e.g., lecture), multiple voice (e.g., pair discussion), and no voice (e.g., clicker question thinking) activities. Applying DART to 1,486 recordings of class sessions from 67 courses, a total of 1,720 h of audio, revealed varied patterns of lecture (single voice) and nonlecture activity (multiple and no voice) use. We also found that there was significantly more use of multiple and no voice strategies in courses for STEM majors compared with courses for non-STEM majors, indicating that DART can be used to compare teaching strategies in different types of courses. Therefore, DART has the potential to systematically inventory the presence of active learning with ∼90% accuracy across thousands of courses in diverse settings with minimal effort.

Keywords: active learning; assessment; evidence-based teaching; lecture; science education.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: K.D.T., J.N.S., M.W., S.B.S., and M.T.O. have filed a provisional patent on the subject of this report, DART (US Provisional Patent Application No. 62/398,888).

Figures

Fig. 1.
Fig. 1.
Sound analysis can differentiate lecture and nonlecture classroom activities. All: Sound levels over time sampled at 2 Hz, with each tickmark indicating 2 min. Typical results are shown. (A) Class session with mostly lecture (94 min) with human annotation codes indicated above the waveform. (B) Background color indicates DART prediction for the recording shown in A. (C) Class session with varied learning activities (108 min) with human annotation codes indicated. (D) Background colors indicate DART predictions for recording in C. (E) DART prediction, small class (n = 15 students; 98 min). (F) DART prediction, large class (n = 287 students; 49 min). (G) Examples of DART learning activity footprints from different class sessions: thinking, writing, or clicker response; pair or group discussion; lecture; think-pair-share.
Fig. S1.
Fig. S1.
Using machine learning to optimize the DART algorithm for classifying classroom noise as single voice, multiple voice, or no voice. (A) Each 0.5-s sample from each recording from the pilot group was tagged with its label from human annotation (S for single voice, M for multiple voice, or N for no voice), the mean volume of the 15-s window of audio around it, and the SD (std) in that window’s volume. Mean volume and SD were normalized with respect to their class session. (B) Ensemble of binary decision trees used to classify classroom audio recordings. (C and D) Optimizing parameters for identifying nature of classroom noise samples using 10-fold stratified cross-validation with grid search. Example below shows the process of optimizing parameters for classifying samples as single voice. (C) Samples were sorted into single voice (n = 493,862) and nonsingle voice (n = 146,290) based on human annotation and further randomly and equally divided into 10 groups each (S1–S10 and NS1–NS10). These groups were recombined 10 times to make 10 folds, each of which contained all of the data. Each fold had a different pair of groups (i.e., S1/NS1 or S2/NS2) designated as the test set, with all other groups forming the validation set. These folds were all tested using the grid search method that empirically tested all volume and SD parameters and measured error for each of these parameter sets. (D) Grid search for choosing cut-off parameters for classifying samples as either belonging to a given annotation category or not. Different combinations of mean volume in window and SD of the window volume were tried as cut-off parameters on each of the 10 folds. The error rates (percentage of samples where the computer and human annotations did not match) for the validation set and the test set were calculated and are represented as heat maps with red showing high-validation error and blue showing low-validation error for each fold. The parameters were first tested at a low resolution (0.5 SD intervals), and the parameters that yielded the lowest validation error were then explored at a higher resolution (0.01 SD intervals). The combination of cut-offs for mean volume and mean SD of volume with the lowest average validation error over all folds was selected for the final version of the DART algorithm. The test error was an estimate of generalized model performance.
Fig. 2.
Fig. 2.
DART accurately identifies single voice and conservatively estimates multiple and no voice. Recordings from eight instructors from two colleges teaching one course each were used to produce this data. Pie charts on the Left show rates for hits (dark purple) and misses (light purple) and on the Right show rates for correct rejections (dark teal) and false alarms (light teal) for each DART mode. Both the number in parentheses and the area of the pie chart represent the proportion of each mode present in human annotations. d′, the sensitivity index, is a measurement of the difference between the signal and noise distributions. (A) Single voice, (B) multiple voice, (C) no voice.
Fig. 3.
Fig. 3.
DART can be used to analyze large numbers of courses. (A) Percentage of absolute time spent in single voice (SV), multiple voice (MV), and no voice (NV) for all eligible courses (n = 67). Courses ordered in increasing order of single voice percentage. Boxes indicate minimum and maximum percentages spent in single voice. (B) Percentage of absolute time spent in various modes for all class sessions from eligible courses (n = 1,486). Class sessions ordered in increasing order of single voice. Boxes indicate minimum and maximum percentages spent in single voice. (C and D) Percentage of time spent in multiple or no voice in each class session in time order for two representative courses, course 1 and course 2. (E) Proportion of courses where all class sessions have some multiple or no voice (<100% single voice) (Left) and where at least half of all class sessions have some multiple or no voice (Right). (F) Average time spent in multiple or no voice for courses with one female (n = 36) or male (n = 26) instructor (cotaught courses excluded). Error bars represent SE. n.s.: P = 0.10. (G) Average time spent in multiple or no voice for biology majors’ (n = 32) and nonbiology majors’ (n = 35) courses. Error bars represent SE. *P = 0.01.
Fig. S2.
Fig. S2.
DART can accurately identify when lecture with Q/A occurs. (A) Percentage of the time each human annotation code was labeled by the DART prediction as single voice, multiple voice, or no voice. Shaded boxes represent the DART prediction mode that was most often assigned to that row’s human annotation code. (B) Percentage of the time each DART prediction mode was labeled by each human annotation code. Shaded boxes represent the human annotation code that is most represented in that row’s DART prediction mode.

References

    1. Arum R, Roksa J. Academically Adrift: Limited Learning on College Campuses. Univ of Chicago Press; Chicago: 2010.
    1. Singer SR, Nielsen NR, Schweingruber HA, editors. Discipline-Based Education Research: Understanding and Improving Learning in Undergraduate Science and Engineering. National Academies; Washington, DC: 2012.
    1. Seymour E, Hewitt NM. Talking About Leaving: Why Undergraduates Leave The Sciences. Westview Press; Boulder, CO: 1997.
    1. Graham MJ, Frederick J, Byars-Winston A, Hunter A-B, Handelsman J. Increasing persistence of college students in STEM. Science. 2013;341(6153):1455–1456. - PMC - PubMed
    1. President’s Council of Advisors on Science and Technology . Engage to Excel: Producing One Million Additional College Graduates with Degrees in Science, Technology, Engineering, and Mathematics. Executive Office of the President; Washington, DC: 2012.

Publication types