Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 May;20(5):655-664.
doi: 10.1038/s41592-023-01832-z. Epub 2023 Apr 6.

Julia for biologists

Affiliations
Review

Julia for biologists

Elisabeth Roesch et al. Nat Methods. 2023 May.

Erratum in

  • Author Correction: Julia for biologists.
    Roesch E, Greener JG, MacLean AL, Nassar H, Rackauckas C, Holy TE, Stumpf MPH. Roesch E, et al. Nat Methods. 2023 May;20(5):771. doi: 10.1038/s41592-023-01887-y. Nat Methods. 2023. PMID: 37120675 No abstract available.

Abstract

Major computational challenges exist in relation to the collection, curation, processing and analysis of large genomic and imaging datasets, as well as the simulation of larger and more realistic models in systems biology. Here we discuss how a relative newcomer among programming languages-Julia-is poised to meet the current and emerging demands in the computational biosciences and beyond. Speed, flexibility, a thriving package ecosystem and readability are major factors that make high-performance computing and data analysis available to an unprecedented degree. We highlight how Julia's design is already enabling new ways of analyzing biological data and systems, and we provide a list of resources that can facilitate the transition into Julian computing.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interest.

Figures

Figure 1:
Figure 1:
Julia is a tool for biologists to discover new science. (a) In the biological sciences, the most obvious alternative to the programming language Julia is R, Python or Matlab. Here, we contrast the two potential pathways to new biology with a mountaineering analogy: The top of the mountain represents “New Biology”[10, 11]. There are two potential base camps for the ascent: Base camp 1 (left, red) is “R/Python/Matlab”. Base camp 2 (right, green) is “Julia”. To get to the top, the mountaineer – representing a researcher – needs to overcome certain obstacles such as a glacier and a chasm. They represent research hurdles such as large and diverse datasets or complex models. Starting at the “Julia” base camp, the mountaineer has access to efficient and effective tools such as a bridge over the glacier and a rocket to simple fly over the chasm. They represent Julia’s top three language design features: Abstraction, speed and metaprogramming. With these tools, the journey to the top of the mountain becomes much easier for the excursionist. Julia allows biologists to not be hold back by problems discussed in (b) and (c). (b) The “Two-language problem” refers to having separate languages for algorithm development and prototyping (such as R or Python), and production-runs, such as (C/C++ or Fortran), respectively. Julia was designed to be good at both tasks, which can reduce programming efforts and software complexity. (c) The “Expression problem” refers to the effort required to define new (optimised) data types and functions that can be defined by users and added to existing external code bases.
Figure 2:
Figure 2:
Julia’s speed feature. (a) Examples relevant to biology. Left: Comparison of time to calculate the mutual information for all possible pairs of genes of a single cell dataset [16]. Right: Benchmark of ODE solvers implemented in Julia and Fortran, C, MATLAB, Python, and R for the Lotka-Volterra model (More systems in [23]). (b) Illustration of speed-up of vectorisable code (as in (a)). (c) Intuition for speed up of non-vectorizable code (as in b).
Figure 3:
Figure 3:
Interfaces in Julia: Switching between different pipettors without recreating whole experimental protocols is possible for experimental scientists because a common understanding, or interface, exists that specifies tasks which pipettors should be able to perform in a similar manner. In Julia, we can define interfaces such as the AbstractArray class where we specify rules any array-like computational object has to follow. Interfaces allow us to share methods developed for abstract types to custom types. By building our algorithms around interfaces we can make use, reuse, and refinement of code easier.
Figure 4:
Figure 4:
An overview of Julia’s package ecosystem presented by topic groups.
Figure 5:
Figure 5:
The abstraction feature in Julia. (a) We show a structural bioinformatics pipeline which combines multiple Julia packages seamlessly together. This gives developers and users the flexibility is that the effort and time to generate new models and complex workflows is significantly reduced and collaboration is made easier. (b) From the pipeline, we highlight the step “Graph of contacting residues” as an example of Julia’s solution to the first part of the expression problem (Illustration of expression problem in Figure 1) which is the easy code base extension to new functions. (c) The second highlighted step from the pipeline is “Plot distance map” where a new plot recipe is defined for a domain specific type, i.e. we demonstrate the extension of an existing code base to new types. Along this, we show the Julia code for defining a new type and and a new plot recipe: As an example, this is the structure MyBioStruc which captures results of prediction algorithms of amino acid (AA) sequences based on data. It is defined with the fields predicted_AA a vector of characters which are the predicted AAs, certainty_AA a vector of numbers, quantifying the certainty for each predicted AA, the string study naming the respective data study the prediction is based on and the string alg naming the respective prediction algorithm. With the macro @recipe we can specify how the function plot(…) should work for our newly specified example type. Here, we define that this should create a line plot of the predicted amino acids with the mean of the certainty of the prediction as opacity of the line specified by the Plots.jl package as α More details on the selected example code is in this referenced online material.
Figure 6:
Figure 6:
Julia’s Metaprogramming feature. (a) Illustration of metaprogramming and an analogy to the central dogma of molecular biology. Similar to how a transcription factor, initially encoded in DNA, can control gene expression and modify RNA levels of an organism, with metaprogramming we can create code with feedback effect. (b) Example application of metagprogramming in biology. Metaprogramming is especially helpful for large scale, automated model development. We can write code that adapts the model definition automatically e.g. in light of new data or based on how they interact with other sub-models (V1, …, Vn: the different versions of the model definition). For example when constructing models of cellular systems we can combined structurally similar models for the different MAP kinases present in human cells, and build compartmental models by explicitly modelling the kinase dynamics in the nucleus and the cytosol [50]. (c) Example workflow of model construction. The adaption process of models could for example start with a theoretical inferred mathematical description, captured via the @reaction_network syntax of the Julia package Catalyst.jl. Subsequently, given experimental data, we evaluate an objective function of the current model capturing the descriptiveness of the model in light of the data. Depending on the outcome of this evaluation, the model will be updated, e.g. via adding new reactions to the model via the macro @add_reactions. More details on the selected example code is in the referenced online material.
Figure 7:
Figure 7:
Julia’s Abstraction feature and performance gains in image processing: We demonstrate (a) contrasting (i) and segmenting (ii) images as examples for high performance vectorizable (i) and non-vectorizable (ii) image manipulations, respectively. Performance comparison with Python is provided (ms: millisecond, v: voxel, n: n x n patch of array).(b) Example of robustness in image processing via a 2-step image processing pipeline on contrasting and resizing of images in Julia and Python. For more details see the README.md document under https://github.com/ElisabethRoesch/Perspective_Julia_for_Biologists/blob/main/examples/Abstraction/Supplementary_Example_Flexibility_and_performance_in_image_processing/images_lazy.

References

    1. Tomlin CJ & Axelrod JD Biology by numbers: mathematical modelling in developmental biology. Nature Reviews Genetics 8, 331–340 (2007). URL 10.1038/nrg2098. - DOI - PubMed
    1. Auton A. e. a. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). URL 10.1038/nature15393. - DOI - PMC - PubMed
    1. Robson B Computers and viral diseases. preliminary bioinformatics studies on the design of a synthetic vaccine and a preventative peptidomimetic antagonist against the sars-cov-2 (2019-ncov, covid-19) coronavirus. Computers in Biology and Medicine 119, 103670 (2020). URL https://www.sciencedirect.com/science/article/pii/S0010482520300627. - PMC - PubMed
    1. Seefeld K & Linder E Statistics Using R with Biological Examples (K. Seefeld, 2007).
    1. Ekmekci B, McAnany CE & Mura C An introduction to programming for bioscientists: A python-based primer. PLOS Computational Biology 12, e1004867 (2016). URL 10.1371/journal.pcbi.1004867. - DOI - PMC - PubMed

Publication types