From Doubt to Confidence-Overcoming Fraudulent Submissions by Bots and Other Takers of a Web-Based Survey
- PMID: 39680887
- PMCID: PMC11686022
- DOI: 10.2196/60184
From Doubt to Confidence-Overcoming Fraudulent Submissions by Bots and Other Takers of a Web-Based Survey
Abstract
In 2019, we launched a web-based longitudinal survey of adults who frequently use e-cigarettes, called the Vaping and Patterns of E-cigarette Use Research (VAPER) Study. The initial attempt to collect survey data failed due to fraudulent survey submissions, likely submitted by survey bots and other survey takers. This paper chronicles the journey from that setback to the successful completion of 5 waves of data collection. The section "Naïve Beginnings" examines the study preparation phase, identifying the events, decisions, and assumptions that contributed to the failure (eg, allowing anonymous survey takers to submit surveys and overreliance on a third-party's proprietary fraud detection tool to identify participants attempting to submit multiple surveys). "A 5-Alarm Fire and Subsequent Investigation" summarizes the warning signs that suggested fraudulent survey submissions had compromised the data integrity after the initial survey launched (eg, an unanticipated acceleration in recruitment and a voicemail alleging fraudulent receipt of multiple gift codes). This section also covers the investigation process, along with conclusions regarding how the methodology was exploited (eg, clearing cookies and using virtual private networks) and the extent of the issue (ie, only 363/1624, 22.4% of the survey completions were likely valid). "Building More Resilient Methodology" details the vulnerabilities and threats that likely compromised the initial survey attempt (eg, anonymity and survey bots); the corresponding mitigation strategies and their benefits and limitations (eg, personal record verification platforms, IP address matching, virtual private network detection services, and CAPTCHA [Completely Automated Public Turing test to tell Computers and Humans Apart]); and the array of strategies that were implemented in future survey attempts. "Staying Vigilant" recounts the identification and management of an additional threat that emerged despite the implementation of an array of mitigation strategies, underscoring the need for ongoing vigilance and adaptability. While the precise nature of the threat remains unknown, the evidence suggested multiple fraudulent surveys were submitted by a single or connected entities, who likely did not possess e-cigarettes. To mitigate the chance of reoccurrence, participants were required to submit an authentic photo of their most used e-cigarette. Finally, in "Reflection 4 Years Later," we share insights after completing 5 waves of data collection without additional threats or vulnerabilities uncovered that necessitated the application of further mitigation strategies. Reflections include reasons for confidence in the data's integrity, the scalability and cost-effectiveness of the study protocols, and the potential introduction of sampling bias through recruitment and mitigation strategies. By sharing our journey, we aim to provide valuable insights for researchers facing similar challenges with web-based surveys and those seeking to minimize such challenges a priori. Our experiences highlight the importance of proactive measures, continuous monitoring, and adaptive problem-solving to ensure the integrity of data collected from participants recruited from web-based platforms.
Keywords: US; United States; VAPER; Vaping and Patterns of E-cigarette Use Research; adult; cessation; challenges; data collection; data integrity; data quality; e-cigarette; e-cigs; fake data; internet survey; longitudinal survey; online survey; prevalence; recruitment; smoke; smoking; tobacco control; vaping; web-based survey.
©Jeffrey J Hardesty, Elizabeth Crespi, Joshua K Sinamo, Qinghua Nian, Alison Breland, Thomas Eissenberg, Ryan David Kennedy, Joanna E Cohen. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 16.12.2024.
Conflict of interest statement
Conflicts of Interest: TE is a paid consultant in litigation against the tobacco industry; has been a paid consultant in litigation against the electronic cigarette industry; and is named on one patent for a device that measures the puffing behavior of electronic cigarette users, on another patent application for a smartphone app that determines electronic cigarette device and liquid characteristics, and on a third patent application for a smoking cessation intervention. JEC was a paid consultant in litigation against a tobacco company.
Similar articles
-
The Vaping and Patterns of e-Cigarette Use Research Study: Protocol for a Web-Based Cohort Study.JMIR Res Protoc. 2023 Mar 2;12:e38732. doi: 10.2196/38732. JMIR Res Protoc. 2023. PMID: 36862467 Free PMC article.
-
Population Recruitment Strategies in the Age of Bots: Insights from the What Is on Your Plate Study.Curr Dev Nutr. 2025 Apr 15;9(5):107442. doi: 10.1016/j.cdnut.2025.107442. eCollection 2025 May. Curr Dev Nutr. 2025. PMID: 40487551 Free PMC article.
-
Assessment of Fraud Deterrence and Detection Procedures Used in a Web-Based Survey Study With Adult Black Cisgender Women: Description of Lessons Learned and Recommendations.JMIR Form Res. 2025 Mar 12;9:e59955. doi: 10.2196/59955. JMIR Form Res. 2025. PMID: 40073396 Free PMC article.
-
The Messages Presented in Electronic Cigarette-Related Social Media Promotions and Discussion: Scoping Review.J Med Internet Res. 2019 Feb 5;21(2):e11953. doi: 10.2196/11953. J Med Internet Res. 2019. PMID: 30720440 Free PMC article.
-
Modeling the Probability of Fraud in Social Media in a National Cannabis Survey [Internet].Research Triangle Park (NC): RTI Press; 2021 Sep. Research Triangle Park (NC): RTI Press; 2021 Sep. PMID: 36351098 Free Books & Documents. Review.
Cited by
-
It's raining bots: how easier access to internet surveys has created the perfect storm.BMJ Open Qual. 2025 Jun 1;14(2):e003208. doi: 10.1136/bmjoq-2024-003208. BMJ Open Qual. 2025. PMID: 40451296 Free PMC article.
References
-
- Evans JR, Mathur A. The value of online surveys: a look back and a look ahead. Internet Res. 2018;28(4):854–887. doi: 10.1108/intr-03-2018-0089. - DOI
-
- Peer E, Rothschild D, Gordon A, Evernden Z, Damer E. Data quality of platforms and panels for online behavioral research. Behav Res Methods. 2022;54(4):1643–1662. doi: 10.3758/s13428-021-01694-3. https://europepmc.org/abstract/MED/34590289 10.3758/s13428-021-01694-3 - DOI - PMC - PubMed
-
- Bell AM, Gift T. Fraud in online surveys: evidence from a nonprobability, subpopulation sample. J Exp Polit Sci. 2023;10(1):148–153. doi: 10.1017/xps.2022.8. - DOI
-
- Guillory J, Kim A, Murphy J, Bradfield B, Nonnemaker J, Hsieh Y. Comparing twitter and online panels for survey recruitment of e-cigarette users and smokers. J Med Internet Res. 2016;18(11):e288. doi: 10.2196/jmir.6326. https://www.jmir.org/2016/11/e288/ v18i11e288 - DOI - PMC - PubMed
-
- Teitcher JEF, Bockting WO, Bauermeister JA, Hoefer CJ, Miner MH, Klitzman RL. Detecting, preventing, and responding to "fraudsters" in internet research: ethics and tradeoffs. J Law Med Ethics. 2015;43(1):116–133. doi: 10.1111/jlme.12200. http://hdl.handle.net/2027.42/111094 - DOI - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Medical