Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 28;15(1):7193.
doi: 10.1038/s41598-025-91249-9.

SHIP identifies genomic safe harbors in eukaryotic organisms using genomic general feature annotation

Affiliations

SHIP identifies genomic safe harbors in eukaryotic organisms using genomic general feature annotation

Matheus de Castro Leitão et al. Sci Rep. .

Abstract

Integrating foreign genes into loci, allowing their transcription without affecting endogenous gene expression, is the desirable strategy in genomic engineering. However, these loci, known as genomic safe harbors (GSHs), have been mainly identified by empirical methods. Furthermore, the most prominent available GSHs are localized within regions of high gene density, raising concerns about unstable expression. As synthetic biology is moving towards investigating polygenic modules rather than single genes, there is an increasing demand for tools to identify GSHs systematically. To expand the GSH repertoire, we present SHIP, an algorithm designed to detect potential GSHs in eukaryotes. Using the chassis organism Saccharomyces cerevisiae, five GSHs were experimentally curated based on data from DNA sequencing, stability, flow cytometry, qPCR, electron microscopy, RT-qPCR, and RNA-Seq assays. Our study places SHIP as a valuable tool for providing a list of promising candidates to assist in the experimental assessment of GSHs in eukaryotic organisms with available annotated genomes.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare software registration (Process Number: BR512023002017-6).

Figures

Fig. 1
Fig. 1
SHIP identification of putative genomic safe harbors (pGSHs). (A) Strategy overview of the eukaryotic genomic Safe Harbor Identification Program. The first step involved the design of the Safe Harbor Identification Program (SHIP), an algorithm for searching genomic safe harbors in eukaryotes from general feature data (Design); S. cerevisiae was chosen for the in vivo validation of SHIP, resulting in its genetic transformation of two overlapping BioBricks composed of a reporter gene and an auxotrophic marker (Test); followed by the analysis steps (Analysis). (B) Representative scheme of the SHIP software. As inputs, a genomic annotation (.gff3), a regulatory annotation (.gff3), and two files (.json) with the indication of genetic parts to be considered for pGSH selection. As output files, the algorithm returns a table and a graph (.png) with the distribution of intergenic distances and a file (.txt) with the list of intergenic regions and regulatory aspects. (C) Histogram with the genomic distribution of the intergenic regions between the three possible arrangements of the flanking genes. (D) Chromosome number, coordinates, neighboring genes, and size of the intergenic regions identified as pGSH. (E) Ideogram marking the identified pGSHs generated with Ideogram.js.
Fig. 2
Fig. 2
Sanger sequencing of the clones demonstrates correct insertion in the region of each GSH in the 5 lineages. (A) Representation of PCR amplification for sequencing using primers targeting the genomic region outside homology arms (HR) marked in yellow. ymUkG1 is marked in green and URA3 in pink. (B) S2U. (C) S3U. (D) S4U. (E) S5U. (F) S6U. Multisequence alignment was performed with the MAFFT v7 program using the following parameters: Gap opening penalty of 1.53; Gap extension penalty equal to 0 and quick direction adjustment function enabled.
Fig. 3
Fig. 3
GSHs genomic characterization and Growth curve for each GSH cell line. (A) Schematic strategy of approximately 100 mitotic generations. (B) PCR analysis of each one of the five GSH cell lines after approximately 100 generations with genomic primers specific for complete amplification of the insert. For each biological triplicate, fifteen colonies were randomly collected. BY4741 as positive control (C+) and H2O as negative control (C−). A cut of each gel is shown, removing wells and unused parts. Full-length gels are included in the Data availability section. https://figshare.com/s/c2d9bcc901d4114e11e8 (C) Schematic strategy for the growth rate analysis of the five GSH cell lines compared to the BY4741, as control. (D) Average and standard deviations of growth curves measured at 0, 2, 4, 6, 12, 24, and 36 h with an initial OD600 of 0.1. (E) Copy number analysis of the ymUkG1 gene in 3 clones of each GSH cell line. (F) Graphic bars showing the mean and standard deviation of the percentage of cells expressing ymUKG1 on all GSH cell lines. Experiments performed in technical and biological replicates on three independent days. Bioicons from Servier Medical Art licensed under CC BY 4.0.
Fig. 4
Fig. 4
Expression dynamics of neighboring genes of GSHs cell lines by RT-qPCR. (A) Schematic representation of the genes analyzed. Histogram of the relative expression of all neighboring genes for each GSH lineage. (B) S2U. (C) S3U. (D) S4U. (E) S5U. (F) S6U. Green column indicates expression of the ymUKG1 gene. Colored columns highlight neighboring genes (Purple at 5’ and Blue at 3’) of the GSH of the analyzed strain and gray columns show the neighboring genes of the other GSHs. The cracked columns represent values from the untransformed control (BY4741). Relative expression on the Y axis and genes analyzed on the X axis. Asterisks indicate significant changes in the analysis of variance (ANOVA). (*) p < 0.05, (**) p < 0.01, (***) p < 0.001 and (****) p < 0.0001.
Fig. 5
Fig. 5
Genomic expression dynamics of GSHs cell lines by RNA-Seq. (A) PCA graph of the differential gene expression of the GSHs lines (S2U, S3U, S4U, S5U and S6U) in relation to the untransformed control BY4741. Vulcanos Plots of differential expression of GSHs cell lines by RNA-Seq. (B) S2U. (C) S3U. (D) S4U. (E) S5U. (F) S6U. Colored dots represent differentially downregulated (blue) and upregulated (red) genes. Described genes are identified with their standard name and undescribed or putative genes are identified with their systematic name.
Fig. 6
Fig. 6
Functional and expression analysis of the S2U5A6M triplex GSH cell line. (A) Sanger sequencing of the S2U5A6M strain. (B) Growth and amylase activity of S2U5A6M clones and growth of S2U6M (no-amylase control) on minimal medium containing soluble starch. The plate was stained with iodine vapor and clear halos indicate starch hydrolysis. (C) Amylase enzyme activity assay. All clones significantly differed from BY4741 according to a Mann-Whitney U test but did not show statistically significant differences between themselves. (**) p < 0.01 and (***) p < 0.001. (D) Expression dynamics of neighboring genes by RT-qPCR. Green, yellow, and red columns respectively indicate expression of the ymUKG1, ymBeRFP, and α-amylase (α-AMY). Colored columns highlight neighboring genes (Blue at 5’ and Purple at 3’) of the GSH used and gray columns show the neighboring genes of the other GSHs. The cracked columns represent values from the untransformed control (BY4741). Relative expression on the Y axis and genes analyzed on the X axis. Asterisks indicate significant changes in the analysis of variance (ANOVA). (***) p < 0.001 and (****) p < 0.0001.

References

    1. Sadelain, M., Papapetrou, E. P. & Bushman, F. D. Safe harbours for the integration of new DNA in the human genome. Nat. Rev. Cancer12, 51–58 (2012). - PubMed
    1. Arras, S. D. M., Chitty, J. L., Blake, K. L., Schulz, B. L. & Fraser, J. A. A genomic safe Haven for mutant complementation in cryptococcus neoformans. PLoS One10, 1–16 (2015). - PMC - PubMed
    1. Kong, S., Yu, W., Gao, N., Zhai, X. & Zhou, Y. J. Expanding the neutral sites for integrated gene expression in Saccharomyces cerevisiae. FEMS Microbiol. Lett.369, fnac081 (2022). - PubMed
    1. Samulski, R. et al. Targeted integration of adeno-associated virus (AAV) into human chromosome 19. EMBO J.10, 3941–3950 (1991). - PMC - PubMed
    1. Liu, R. et al. Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell86, 367–377 (1996). - PubMed

LinkOut - more resources