Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 20:8:e9414.
doi: 10.7717/peerj.9414. eCollection 2020.

The timing mega-study: comparing a range of experiment generators, both lab-based and online

Affiliations

The timing mega-study: comparing a range of experiment generators, both lab-based and online

David Bridges et al. PeerJ. .

Abstract

Many researchers in the behavioral sciences depend on research software that presents stimuli, and records response times, with sub-millisecond precision. There are a large number of software packages with which to conduct these behavioral experiments and measure response times and performance of participants. Very little information is available, however, on what timing performance they achieve in practice. Here we report a wide-ranging study looking at the precision and accuracy of visual and auditory stimulus timing and response times, measured with a Black Box Toolkit. We compared a range of popular packages: PsychoPy, E-Prime®, NBS Presentation®, Psychophysics Toolbox, OpenSesame, Expyriment, Gorilla, jsPsych, Lab.js and Testable. Where possible, the packages were tested on Windows, macOS, and Ubuntu, and in a range of browsers for the online studies, to try to identify common patterns in performance. Among the lab-based experiments, Psychtoolbox, PsychoPy, Presentation and E-Prime provided the best timing, all with mean precision under 1 millisecond across the visual, audio and response measures. OpenSesame had slightly less precision across the board, but most notably in audio stimuli and Expyriment had rather poor precision. Across operating systems, the pattern was that precision was generally very slightly better under Ubuntu than Windows, and that macOS was the worst, at least for visual stimuli, for all packages. Online studies did not deliver the same level of precision as lab-based systems, with slightly more variability in all measurements. That said, PsychoPy and Gorilla, broadly the best performers, were achieving very close to millisecond precision on several browser/operating system combinations. For response times (measured using a high-performance button box), most of the packages achieved precision at least under 10 ms in all browsers, with PsychoPy achieving a precision under 3.5 ms in all. There was considerable variability between OS/browser combinations, especially in audio-visual synchrony which is the least precise aspect of the browser-based experiments. Nonetheless, the data indicate that online methods can be suitable for a wide range of studies, with due thought about the sources of variability that result. The results, from over 110,000 trials, highlight the wide range of timing qualities that can occur even in these dedicated software packages for the task. We stress the importance of scientists making their own timing validation measurements for their own stimuli and computer configuration.

Keywords: Experiments; MTurk; Online testing; Open-source; Precision; Reaction times; Software; Stimuli; Sub-millisecond; Timing.

PubMed Disclaimer

Conflict of interest statement

Although we are authors of one of the packages being compared here (PsychoPy/PsychoJS), that software is provided free and open-source to all users. We are also shareholders of Open Science Tools Ltd, which receives income from studies being conducted on Pavlovia.org. Pavlovia.org is agnostic to the software package creating the studies—it supports studies from PsychoPy, jsPsych and Lab.js—and the resulting income funds further development of the open-source software packages on which the studies are created.

Figures

Figure 1
Figure 1. Precision across the packages and operating systems for lab-based software.
The point size represents the standard deviation of the respective times in that configuration. In general, the majority of the differences are caused by differences between the packages (rows) although there are clearly some differences also between operating systems.
Figure 2
Figure 2. The precision across the packages, operating systems, and browsers for the two major cross-platform browsers.
The point size represents the standard deviation of the respective times in that configuration. There is a greater mix of performance in the online provision, with some packages performing better on one browser/OS combination, and another package performing better on another.

References

    1. Anwyl-Irvine AL, Dalmaijer ES, Hodges N, Evershed J. Online timing accuracy and precision: a comparison of platforms, browsers, and participant’s devices. 2020. https://doi.org/10.31234/osf.io/jfeca https://doi.org/10.31234/osf.io/jfeca
    1. Anwyl-Irvine AL, Massonnié J, Flitton A, Kirkham N, Evershed JK. Gorilla in our midst: an online behavioral experiment builder. Behavior Research Methods. 2019;52(1):388–407. doi: 10.3758/s13428-019-01237-x. - DOI - PMC - PubMed
    1. Brand A, Bradley MT. Assessing the effects of technical variance on the statistical outcomes of web experiments measuring response times. Social Science Computer Review. 2012;30(3):350–357. doi: 10.1177/0894439311415604. - DOI
    1. Forster KI, Forster JC. DMDX: a windows display program with millisecond accuracy. Behavior Research Methods, Instruments, & Computers. 2003;35(1):116–124. doi: 10.3758/BF03195503. - DOI - PubMed
    1. De Leeuw JR. jsPsych: a JavaScript library for creating behavioral experiments in a Web browser. Behavior Research Methods. 2015;47(1):1–12. doi: 10.3758/s13428-014-0458-y. - DOI - PubMed