Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 2;25(1):1192.
doi: 10.1186/s12870-025-07258-3.

Genome-wide identification and characterization of DUF789 genes in cotton: implications for fibre development

Affiliations

Genome-wide identification and characterization of DUF789 genes in cotton: implications for fibre development

Rasmieh Hamid et al. BMC Plant Biol. .

Abstract

Background: Proteins containing domains of unknown function (DUFs) play a crucial role in plant growth, development and stress adaptation, but many of them are still uncharacterized. The DUF789 family is one of the least studied of these, especially in economically significant crops like cotton (Gossypium spp.), whose possible function in fibre production and abiotic stress response is yet unknown.

Results: In a comprehensive genome-wide analysis, a total of 91 DUF789 genes were identified in four Gossypium species: G. arboreum, G. barbadense, G. raimondii and G. hirsutum. Evolutionary and phylogenetic analyses placed the GhDUF789 proteins into different clades, with purifying selection identified as the major evolutionary force. Analyses of gene structure and conserved motifs revealed considerable structural diversity, with closely related genes showing similar exon-intron patterns and motif compositions. Synteny and duplication analyses showed that segmental and tandem duplications contributed to the DUF789 family expansion in cotton. Analysis of cis-regulatory elements revealed that the GhDUF789 promoters are enriched in motifs responsive to hormonal, developmental, light-induced and abiotic stresses. GO enrichment analyses, prediction of protein-protein interaction and secondary and tertiary structure modelling, indicated that GhDUF789 proteins are involved in clathrin-mediated vesicle trafficking and membrane trafficking. The miRNA target prediction revealed regulatory interactions with conserved miRNAs from cotton, in particular ghr-miR414 and ghr-miR396. Expression profiling based on transcriptome analysis, supported by validation using qRT-PCR, revealed that several GhDUF789 genes are differentially expressed during fibre development and respond strongly to drought, heat, salinity and cold stress, especially in drought-tolerant genotypes.

Conclusions: This study provides the first comprehensive characterization of the DUF789 gene family in cotton and offers new insights into its evolutionary dynamics, structural features and potential role in fibre development and adaptation to abiotic stress. The results provide a solid foundation for future functional studies and identify candidate GhDUF789 genes for targeted genetic improvement of stress resistance and fibre quality in cotton.

Keywords: DUF789 gene family; Development; Expression; Genes; Growth; Regulation.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: All experimental studies on plants were conducted in compliance with relevant institutional, national, and international guidelines and legislation. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests. Clinical trial number: Not applicable.

Figures

Fig. 1
Fig. 1
Chromosomal locations and gene duplication events of DUF789 genes. The corresponding chromosome numbers are given to the left of each bar. GhDUF789 gene pairs resulting from segmental and tandem duplications are connected by lines. Distribution of GhDUF789 genes on chromosomes of G. hirsutum  (A), G. barbadense  (B), G. arboreum  (C), G. raimondii  (D)
Fig. 2
Fig. 2
(A) The phylogenetic analysis of GhDUF789 genes. (B) The phylogenetic analysis of DUF789 genes among Arabidopsis and four cotton species. The phylogenetic tree was constructed using the program MEGA11.0. (The maximum-likelihood) algorithm, bootstrap value = 1000). The prefixes Ga, Gr, Gh and Gb represent G. arboretum, G. raimondii, G. hirsutum and G. barbadense, respectively. Bootstrap values are indicated by triangular node symbols of varying sizes, with larger triangles corresponding to higher bootstrap support values (closer to 1.0), signifying greater confidence in the branching patterns
Fig. 3
Fig. 3
Evolutionary analysis of the GhDUF789. (A) Collinearity analysis of the genomes of G. arboretum, G. raimondii, G. hirsutum G. barbadense and A. thaliana. Grey lines represent syntenic sequences, and the highlighted blue lines indicated syntenic gene pairs of GhDUF789. (B) Collinearity analysis of the genomes of G. arboretum, G. raimondii, and G. hirsutum. (C) Analysis of segmental and tandem duplication of the GhDUF789. The values indicated identity (%). (D) Collinearity analysis of G. hirsutum DUF789 genes, A01 to A13 represent chromosomes of the A subgenome, while D01 to D13 represent chromosomes of the D subgenome
Fig. 4
Fig. 4
Gene Structure, Conserved Motifs, Domains, and Chromosomal Localization of the GhDUF789 Gene Family in upland cotton. (A) Phylogenetic tree of GhDUF789genes depicting their evolutionary relationships. Different colors indicate distinct clades. (B) Domain architecture of GhDUF789proteins, highlighting conserved functional domains with different colors. (C) Gene structure analysis displaying exon-intron organization of GhDUF789genes. Yellow boxes represent exons, black lines represent introns. (D) Conserved motif distribution among GhBAG proteins. Identified motifs are color-coded. (E) Sequence logo representation illustrating conserved amino acid residues in GhBAG proteins
Fig. 5
Fig. 5
(A) Analysis of cis-regulatory elements on GhDUF789 genes. Heatmap showing the distribution of cis-acting elements in the 2 kb promoter regions of GhDUF789 genes. Elements are categorized into cell cycle/development, hormone-, light-, and stress-responsive groups, with a phylogenetic tree on the left
Fig. 6
Fig. 6
Protein-protein interaction of GhDUF789 proteins based on known Arabidopsis protein orthologous. (A) The network was constructed using the online STRING software. The proteins were displayed at network nodes with 3D structure of the proteins in nodes, and the line colors indicate different data sources. (B) Predicted 3D models of GhDUF789 proteins. 3D models were constructed using the online Phyre2 server with default mode. (C) Phylogenetic analysis of GhDUF789 proteins. Each colour line represents a different subfamily of the phylogeny, highlighting the evolutionary relationships among the GhDUF789 proteins
Fig. 7
Fig. 7
Schematic representation of GhDUF789 genes targeted by ghr-miRNAs in upland cotton. Predicted miRNA target sites on GhDUF789 genes are shown alongside exon–intron structures. Blue boxes indicate exons; black lines represent introns. Red vertical bars mark predicted miRNA binding sites. The aligned sequences of each miRNA and its corresponding target site are displayed, with vertical lines representing complementary base pairing. Multiple members of the ghr-miR414 and ghr-miR396 families were predicted to target GhDUF789 genes, primarily through cleavage or translation inhibition mechanisms
Fig. 8
Fig. 8
Gene Ontology (GO) enrichment analyses of GhDUF789 genes
Fig. 9
Fig. 9
(A) Heat map of GhDUF789 gene expression across diverse tissues and developmental stages in upland cotton (G. hirsutum). Transcript abundance (log₂-based FPKM values) is shown for GhDUF789 genes in cotyledon, calycle, leaf, petal, pistil, root, stamen, stem, ovule, torus, seed, and during multiple fibre developmental stages (5, 10, 15, 20, and 25 days post anthesis [dpa]). Expression levels were hierarchically clustered to visualise tissue-specific and stage-specific expression patterns. (B) qRT-PCR results of GhDUF789A-1 and GhDUF789D-13 at different fiber developmental stages
Fig. 10
Fig. 10
Expression profile of GhDUF789 genes in upland cotton. (A) A heat map showing the expression profile of GhDUF789 genes in different abiotic stress condition of cotton. (B) Relative expressions of GhFH genes. qRT-PCR analysis was performed to observe the relative expression patterns of GhDUF789 genes

Similar articles

References

    1. Satrio RD, Fendiyanto MH, Miftahudin M. Tools and techniques used at global scale through genomics, transcriptomics, proteomics, and metabolomics to investigate plant stress responses at the molecular level, in Molecular Dynamics of Plant Stress and its Management. Springer; 2024. p. 555–607.
    1. Saleem MH, et al. Omics technologies: unraveling abiotic stress tolerance mechanisms for sustainable crop improvement. J Plant Growth Regul. 2025;44:4165–87 10.1007/s00344-025-11674-y
    1. Panahi B, Hamid R, Jalaly HMZ. Deciphering plant transcriptomes: leveraging machine learning for deeper insights. Curr Plant Biol. 2024:41:100432.
    1. Wang J, et al. Plant organellar genomes: much done, much more to do. Trends Plant Sci. 2024;29(7):754–69. - PubMed
    1. Shan C et al. A comprehensive review of m6A modification in plant development and potential quality improvement. Int J Biol Macromol. 2025:308:142597. - PubMed

LinkOut - more resources