Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Nov 17:2025.11.14.688517.
doi: 10.1101/2025.11.14.688517.

Insights into the Datasets, Tools, and Training Needs of the AnVIL Community: 2024

Affiliations

Insights into the Datasets, Tools, and Training Needs of the AnVIL Community: 2024

Kathryn J Isaac et al. bioRxiv. .

Abstract

The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) provides a secure cloud-based environment where research and education communities can analyze genomic and biomedical data. The platform supports a wide range of data analysis as well as the ability to safely store and access data in compliance with NIH policies. Work on the AnVIL platform can be easily shared to promote reproducible science and collaboration. The purpose of this study is to better understand the current user base of the AnVIL platform. The AnVIL Community Poll aimed to collect baseline information, identify development opportunities, guide the prioritization of user support strategies, and succinctly but comprehensively describe the current AnVIL Community. The AnVIL Team disseminated the inaugural AnVIL Community Poll by sharing it broadly on social media and through AnVIL and related consortia mailing lists. We categorized respondents as either returning or potential users of the AnVIL platform (based on their provided usage description) and examined user experiences: specifically user backgrounds, technological comfort, research interests, computational needs, and preferences for training and support. Our sample of the AnVIL community found opportunities for platform adoption beyond the current user base and identified areas where training should be enhanced, training preferences, and user computational needs. Specifically, while most respondents were involved in human genomics research, there may be potential for growth in adoption of the platform by prioritizing materials to support clinical researchers. All respondents felt availability of specific tools or datasets was a key feature of the platform. The broader community may also benefit from further development or showcasing of resources to facilitate cost management, finding and incorporating analysis tools, and data import. Our sample greatly preferred virtual training opportunities and returning users of the platform foresaw needing large amounts of storage. This poll provided an insightful snapshot of the current state of the AnVIL and demonstrated areas where the AnVIL Team can take specific steps to address barriers related to platform adoption and further support the existing and varied AnVIL Community. This work can be built upon through user interviews, community discussion, and coordinating a recurring poll.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. State of the AnVIL 2024 Community Poll Design.
The State of the AnVIL 2024 Community Poll was designed such that there were 6 parts. Every respondent was asked the same questions for Parts 1, 3, 4, 5, and 6 while respondents were provided with user type specific questions in Part 2, depending upon their answer to the first question of the poll. Part 1 was used to classify a respondent as a returning or potential user of the AnVIL; Part 2 asked questions specific to each user type (returning or potential); Parts 3 – 6 contained demographics, experience, awareness, and preferences questions respectively. Brief descriptions of questions within each part are provided.
Figure 2:
Figure 2:. Background of users in our sample.
A: We asked all respondents “How would you describe your current usage of the AnVIL platform?” Respondents were identified as returning or potential AnVIL users based on their responses. Most of the 22 returning users leverage AnVIL for ongoing projects. The 28 potential users were evenly split between those who have never used the AnVIL (but have heard of it) and those who have used the AnVIL previously, but don’t currently. B: We asked “What is the highest degree you have attained?” Most of the respondents have a PhD or are currently working on a PhD, though a range of career stages were represented. C: We asked all respondents “What institution are you affiliated with?” Most of the respondents, but also the majority of returning individuals using the AnVIL, reported being affiliated with a research intensive institution.
Figure 3:
Figure 3:. Background of users and researcher’s technological comfort.
A: We asked respondents “How much experience do you have analyzing the following data categories?” 21 respondents report that they are “extremely experienced” in analyzing human genomic data, while only 6 respondents report that they are “not at all experienced” in analyzing human genomic data. However, more respondents report being “not at all experienced” in analyzing human clinical data and non-human genomic data. B: Venn diagram showing the overlap for respondents who reported being “moderately” or “extremely experienced” for these various research categories (n = 37). 32% reported such levels of experience for all 3 research categories; the next highest percentage (27%) reported such levels of experience for human genomic data only with no overlap with other research categories.
Figure 4:
Figure 4:. Current work of users related to appropriate personas.
We asked all respondents “What kind of work do you do?” Possible selections (computational work, computational education, project management, etc.) are shown on the y-axis and cells are colored for each choice a respondent (x-axis) selected. Based on selections, respondents were clustered into Admin, Analyst, Clinician, Educator, PI, and not assigned (NA) personas. These assignments are shown in grayscale along the x-axis with a corresponding pie chart showing the relative abundance of these assignments. 2 potential users were assigned “Clinician” personas and 4 were assigned “Educator” personas, compared to 0 and 1 respectively for returning users. The other personas show similar abundances between potential and returning users.
Figure 5
Figure 5. Barriers to platform adoption and user preferences for training and support in our sample.
A: We asked all users to “Rank the following features according to their importance to you as a potential user or for your continued use of the AnVIL.” Responses were averaged within potential and returning user cohorts to find an average rank. All respondents rated having specific tools or datasets supported/available as an important feature for using AnVIL. Compared to returning users, potential users rated having a free-version with limited compute or storage as the most important feature for their potential use of the AnVIL. B: We asked all respondents “Rank how/where you would prefer to attend AnVIL training workshops.” Responses were averaged within potential and returning user cohorts to find an average rank. Both returning and potential users preferred virtual training workshops over other modalities.
Figure 6:
Figure 6:. Awareness and utilization of training and support.
A & B: We asked all respondents “Have you attended a monthly AnVIL Demo?” A: Most respondents had not attended an AnVIL Demo. However, returning users were more represented among AnVIL Demo attendees. B: All responses to (A) except “No, did not know of” were aggregated, showing that the majority of respondents are aware of AnVIL Demos. C & D: We asked all respondents “Have you ever read or posted in our AnVIL Support Forum?” C: Raw responses are shown (users could select more than one). Most respondents have not used the AnVIL support forum, but utilization in some form is reported by 24% of respondents; reading through others’ posts is the most common way of utilizing the support forum within this sample. D: Each set of user responses are recoded and aggregated to examine whether users are or are not aware of the AnVIL Support Forum. We observe that there is awareness of the support forum across potential and returning users.
Figure 7:
Figure 7:. Technological comfort with cloud-based genomic analysis tools.
A: We asked respondents “How would you rate your knowledge of or comfort with these technologies or data features?” Except for Galaxy, potential users tended to report lower comfort levels for the various tools and technologies when compared to returning users. Overall, there was less comfort with containers or workflows than using various programming languages and integrated development environments (IDEs). B: We asked all respondents “Where do you currently run analyses?” Institutional HPC and locally run (personal computers) were the most common responses. Google Cloud Platform (GCP) was reported as used more than other cloud providers within this sample. We also saw that potential users reported using Galaxy (a free option) more than returning users do.
Figure 8:
Figure 8:. Relating poll takeaways to steps in a typical analysis workflow using the AnVIL platform.
Poll takeaways from Table 2 are condensed to combine the takeaway with the proposed step(s) to address it and categorized as training or community takeaways. Training takeaways relate to those with steps the AnVIL Team can take to create new and enhance or highlight existing training materials to address. Community takeaways relate to those meant to converse with, learn from, or grow the AnVIL Community. These takeaways are aligned along a typical analysis workflow that may be performed on the AnVIL platform.

References

    1. Dahlquist JM, Nelson SC, Fullerton SM. Cloud-based biomedical data storage and analysis for genomic research: Landscape analysis of data governance in emerging NIH-supported platforms. HGG Adv. 2023;4: 100196. - PMC - PubMed
    1. Sriram V, Conard AM, Rosenberg I, Kim D, Saponas TS, Hall AK. Addressing biomedical data challenges and opportunities to inform a large-scale data lifecycle for enhanced data sharing, interoperability, analysis, and collaboration across stakeholders. Sci Rep. 2025;15: 6291. - PMC - PubMed
    1. Khan MA, Salah K. Cloud adoption for e-learning: Survey and future challenges. Educ Inf Technol. 2020;25: 1417–1438.
    1. Krampis K, Wultsch C. A review of cloud computing bioinformatics solutions for next-gen sequencing data analysis and research. Meth Next Gener Seq. 2015;2. doi: 10.1515/mngs-2015-0003 - DOI
    1. Cole BS, Moore JH. Eleven quick tips for architecting biomedical informatics workflows with cloud computing. PLoS Comput Biol. 2018;14: e1005994. - PMC - PubMed

Publication types

LinkOut - more resources