Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan 19;6(1):e14547.
doi: 10.1371/journal.pone.0014547.

Characterization of the deleted in autism 1 protein family: implications for studying cognitive disorders

Affiliations

Characterization of the deleted in autism 1 protein family: implications for studying cognitive disorders

Azhari Aziz et al. PLoS One. .

Abstract

Autism spectrum disorders (ASDs) are a group of commonly occurring, highly-heritable developmental disabilities. Human genes c3orf58 or Deleted In Autism-1 (DIA1) and cXorf36 or Deleted in Autism-1 Related (DIA1R) are implicated in ASD and mental retardation. Both gene products encode signal peptides for targeting to the secretory pathway. As evolutionary medicine has emerged as a key tool for understanding increasing numbers of human diseases, we have used an evolutionary approach to study DIA1 and DIA1R. We found DIA1 conserved from cnidarians to humans, indicating DIA1 evolution coincided with the development of the first primitive synapses. Nematodes lack a DIA1 homologue, indicating Caenorhabditis elegans is not suitable for studying all aspects of ASD etiology, while zebrafish encode two DIA1 paralogues. By contrast to DIA1, DIA1R was found exclusively in vertebrates, with an origin coinciding with the whole-genome duplication events occurring early in the vertebrate lineage, and the evolution of the more complex vertebrate nervous system. Strikingly, DIA1R was present in schooling fish but absent in fish that have adopted a more solitary lifestyle. An additional DIA1-related gene we named DIA1-Like (DIA1L), lacks a signal peptide and is restricted to the genomes of the echinoderm Strongylocentrotus purpuratus and cephalochordate Branchiostoma floridae. Evidence for remarkable DIA1L gene expansion was found in B. floridae. Amino acid alignments of DIA1 family gene products revealed a potential Golgi-retention motif and a number of conserved motifs with unknown function. Furthermore, a glycine and three cysteine residues were absolutely conserved in all DIA1-family proteins, indicating a critical role in protein structure and/or function. We have therefore identified a new metazoan protein family, the DIA1-family, and understanding the biological roles of DIA1-family members will have implications for our understanding of autism and mental retardation.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. DIA1-family superimposed on a simplified metazoan phylogeny.
DIA1 is absent from the genome sequences of nematodes (grey font) as well as fungi, plants, amoebozoa and chromalveolates (not shown). Due to a paucity of sequence data, it is unclear whether a DIA1 homologue is absent from the Porifera (grey font). DIA1L was exclusively found in echinoderm and cephalochordate genomes (underlined), which also encode DIA1. DIA1L is absent from tunicates, but a current dearth of sequence data precludes evaluation of hemichordate genomes for DIA1L homologues (indicated by a dotted bold grey line on right hand side, and a lack of underline). A bold dotted black line (right-hand side) indicates that the presence of DIA1R has been confirmed in cartilaginous fish but, probably due to a lack of sequence data, DIA1 has yet to be identified in this class of chordates. Both a DIA1 and DIA1R gene are present in vertebrate genomes (bold font), with a notable absence of DIA1R in acanthopterygian fish (asterisk). Furthermore, two DIA1 paralogues were identified in the genomes of fish from the superorder Ostariophysi, but not in fish from other superorders (see Figure 3). Data for the schematic metazoan phylogeny were from numerous sources –. Proposed rounds of whole-genome duplication (WGD) are indicated by filled black spheres, where two WGDs occurred early in the vertebrate lineage (1R/2R) and a third WGD (3R) in the ray-finned fish lineage before the diversification of teleosts , , . Proposed duplications of DIA1-family genes are indicated by red circles, and ‘loss’ of DIA1-family genes by grey squares. Dashed arrows are used to annotate events occurring in our current model of DIA1-family evolution. Further details of two different models of DIA1-family duplication and ‘loss’ events in the fish lineage (*) can be found in Figure 3, where some fish species encode DIA1 paralogues, while others lack DIA1R. Accession numbers of DIA1, DIA1R, and DIA1L sequences can be found in Tables S1-S5, Table S7 and Table S9.
Figure 2
Figure 2. Amino acid sequence comparison of DIA1 from key species.
The sequence alignment was generated using CLUSTALW . Identical amino acids are highlighted in red font and indicated below the alignment with an asterisk (*). Strongly similar amino acids are highlighted in green font and indicated below the alignment with a colon (:). Weakly similar amino acids are highlighted in blue font and indicated below the alignment with a full stop (.). Dissimilar amino acids are in black font. Amino acids conserved in all DIA1 proteins, as determined by alignment of DIA1 gene products from all species (Figure S2), are underlined (*). Amino acid numbering is provided above the alignment. Gaps required for optimal alignment are indicated by dashes. Standard single-letter amino acid abbreviations are used. Organism abbreviations use the first letter of the genus name, followed by the first four letters of the species (e.g. Homo sapiens DIA1 is abbreviated to HsapiDIA1). The two D. rerio DIA1 paralogues are abbreviated as DreriDIA1a and DreriDIA1b. Full species names and accession numbers can be found in Table S1.
Figure 3
Figure 3. Fish-centric models of DIA1-family evolution.
In both models (A and B), the genome of the hypothetical chordate ancestor encodes two DIA1-family genes: DIA1 and DIA1L. The DIA1L gene has been ‘lost’ in the urochordate/vertebrate lineage, preceding the 1/2R whole genome duplications (WGDs). A duplicated copy of DIA1, which we have called DIA1R, was retained subsequent to the 1/2R WGD event, with both DIA1 and DIA1R identified in lamprey, fish, and tetrapod genomes. In the fish lineage, however, two different models, (A) and (B), could account for our current knowledge of DIA1 family members. In model (A), the DIA1 duplication generating DIA1a and DIA1b coincides with the 3R WGD. Two lineage-specific ‘losses’ of DIA1a have then occurred: the first in the G. morhua lineage, and the second in the Protacanthopterygian/Acathopterygian lineage. There are too few data available to determine whether the channel catfish encodes DIA1a, DIA1b, both, or neither. In model (B), the DIA1 duplication leading to DIA1a and DIA1b in ostariophysans does not coincide with 3R but, instead, is specific to the ostariophysan lineage. Both model (A) and (B) both predict DIA1R gene loss in the acanthopterygian lineage. Proposed rounds of WGD , , are indicated by filled black spheres: numbering of the WGDs is provide in black boxes: those occurring early in the vertebrate lineage marked as 1R/2R and that in the ray-finned fish lineage marked as 3R. Proposed duplications of DIA1-family genes are indicated by red circles, and ‘loss’ of DIA1-family genes by grey squares. Data for the schematic fish phylogeny were from numerous sources , –.
Figure 4
Figure 4. Amino acid sequence alignment of DIA1 and DIA1R proteins from key species.
Gene products from species with known full-length DIA1 and DIA1R orthologues were aligned using CLUSTALW , with DIA1 from the cnidarian species Nematostella vectensis (NvectDIA1), included for comparative purposes. Identical amino acids are highlighted in red font and indicated below the alignment with an asterisk (*). Strongly similar amino acids are highlighted in green font and indicated below the alignment with a colon (:). Weakly similar amino acids are highlighted in blue font and indicated below the alignment with a full stop (.). Dissimilar amino acids are in black font. Amino acids conserved in all DIA1 and DIA1R proteins, as determined by alignment of the DIA1 and DIA1R gene products from all species (Figure S4), are underlined (*). Amino acid numbering is provided above the alignment. Gaps required for optimal alignment are indicated by dashes. Standard single-letter amino acid abbreviations are used. Organism abbreviations use the first letter of the genus name, followed by the first four letters of the species (e.g. Homo sapiens DIA1R is abbreviated to HsapiDIA1R). Full species names and accession numbers can be found in Tables S1 and S4. Predicted signal peptide cleavage sites for human DIA1 and DIA1R (Figure S5) are indicated by arrows above or below the alignment, respectively.
Figure 5
Figure 5. Amino acid sequence alignment of DIA1-family proteins from key species.
All full-length DIA1, DIA1R, and/or DIA1L gene products were aligned using CLUSTALW (Figure S7), and this figure represents excerpts from this master alignment, where proteins from the following phyla only are represented: Cnidaria (N. vectensis DIA1: NvectDIA1), Arthopoda (D. melanogaster DIA1: DmelaDIA1), Echinodermata (S. purpuratus DIA1 and DIA1L: SpurpDIA1 and SpurpDIA1L), Cephalochordata (B. floridae DIA1 and DIA1L paralogues: BflorDIA1, BflorDIA1La, b, and c), and Chordata. The latter includes representatives of the subphylum Urochordata (C. intestinalis DIA1: CinteDIA1) and subphylum Vertebrata (H. sapiens DIA1 and DIA1R: HsapiDIA1 and HsapiDIA1R). Amino acid numbering from the master alignment (Figure S7) is provided above the alignment. Gaps required for optimizing the master alignment (Figure S7) are indicated by dashes. Standard single-letter amino acid abbreviations are used. Organism abbreviations use the first letter of the genus name, followed by the first four letters of the species (e.g. Homo sapiens DIA1R is abbreviated to HsapiDIA1R). Full species names and accession numbers can be found in Tables S1, S4 and S7. The predicted location of the DIA1 and DIA1R signal peptides (SP) are indicated above the alignment (Figure S5). Conserved amino acid motifs detected in the master alignment (Table 3, Figure S7) are indicated in numbered boxes above the alignment. Consensus amino acids for each motif are indicated below the alignment. Amino acids absolutely conserved across the whole DIA1-family are indicated in red upper-case letters, those strongly conserved across the whole DIA1-family are in green, and those weakly conserved across the whole DIA1-family in blue (Figure S7). In addition, black lower-case letters indicate amino acids conserved in over 80% of DIA1-family sequences (Figure S8), while grey lower-case letters indicate conservation in 50–80% of DIA1-family sequences (Figure S9).
Figure 6
Figure 6. Evolutionary relationships between DIA1-family members.
The evolutionary history of the DIA1 family was inferred using the neighbour-joining method . The optimal tree is shown, with statistical reliability of branching assessed using 1000 bootstrap replicates , where percentage values are shown next to the branches. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method and units are the number of amino acid substitutions per site. All positions containing gaps were eliminated from the dataset (Figure S10). There were a total of 258 positions in the final dataset. Phylogenetic analyses were conducted in MEGA4 . The tree was rooted on the cnidarian N. vectensis DIA1 sequence (NvectDIA1), as highlighted with an asterisk. Organism abbreviations use the first letter of the genus name, followed by the first four letters of the species. Full species names and accession numbers can be found in Tables S1, S4 and S7.

References

    1. Bailey A, Le Couteur A, Gottesman I, Bolton P, Simonoff E, et al. Autism as a strongly genetic disorder: evidence from a British twin study. Psychol Med. 1995;25:63–77. - PubMed
    1. Folstein SE, Rosen-Sheidley B. Genetics of autism: complex aetiology for a heterogeneous disorder. Nat Rev Genet. 2001;2:943–955. - PubMed
    1. Veenstra-VanderWeele J, Cook EH., Jr Molecular genetics of autism spectrum disorder. Mol. Psychiatry. 2004;9:819–832. - PubMed
    1. Rutter M. Genetic studies of autism: from the 1970s into the millennium. J Abnorm Child Psychol. 2000;28:3–14. - PubMed
    1. Ronald A, Happé F, Bolton P, Butcher LM, Price TS, et al. Genetic heterogeneity between the three components of the autism spectrum: a twin study. J Am Acad Child Adolesc Psychiatry. 2006;45:691–699. - PubMed

Publication types