Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 13:5:672.
doi: 10.12688/f1000research.8382.2. eCollection 2016.

Closing gaps between open software and public data in a hackathon setting: User-centered software prototyping

Affiliations

Closing gaps between open software and public data in a hackathon setting: User-centered software prototyping

Ben Busby et al. F1000Res. .

Abstract

In genomics, bioinformatics and other areas of data science, gaps exist between extant public datasets and the open-source software tools built by the community to analyze similar data types. The purpose of biological data science hackathons is to assemble groups of genomics or bioinformatics professionals and software developers to rapidly prototype software to address these gaps. The only two rules for the NCBI-assisted hackathons run so far are that 1) data either must be housed in public data repositories or be deposited to such repositories shortly after the hackathon's conclusion, and 2) all software comprising the final pipeline must be open-source or open-use. Proposed topics, as well as suggested tools and approaches, are distributed to participants at the beginning of each hackathon and refined during the event. Software, scripts, and pipelines are developed and published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development. The code resulting from each hackathon is published at https://github.com/NCBI-Hackathons/ with separate directories or repositories for each team.

Keywords: Bioconductor; Education; Genome Annotation; Genomics; Next Generation Sequencing; Open-Source; Pharmacogenomics; Software.

PubMed Disclaimer

Conflict of interest statement

Competing interests: No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Outline of the pipeline in Homogeneous RNA-seq Mapping (HRM) team.
The leftmost column shows procedures and the next columns are tools used in each step and files created by each tool, respectively. HISAT directly accesses SRA data of interest for users and provides aligned reads in a SAM file. Picard classified reads sorted by SAMtools into functional categories using the RefFlat file. After the quality check by qc.pl, HTSeq calculates raw read counts at each region.

References

    1. 1000 Genomes Project Consortium, . Abecasis GR, Auton A, et al. : An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65. 10.1038/nature11632 - DOI - PMC - PubMed
    1. Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. 10.1093/bioinformatics/btp324 - DOI - PMC - PubMed
    1. Gerlinger M, Rowan AJ, Horswell S, et al. : Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med. 2012;366(10):883–92. 10.1056/NEJMoa1113205 - DOI - PMC - PubMed
    1. Busby B, Dillman A, Simpson CL, et al. : Building Genomic Analysis Pipelines in a Hackathon Setting with Bioinformatician Teams: DNA-seq, Epigenomics, Metagenomics and RNA-seq. bioRxiv[Internet].2015. 10.1101/018085 - DOI
    1. Git [Internet]. Git. [cited 2016 Feb 3]. Reference Source

LinkOut - more resources