Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct;126(2-3):65-113.
doi: 10.1007/s12064-007-0012-x. Epub 2007 Sep 22.

Gene and genon concept: coding versus regulation. A conceptual and information-theoretic analysis of genetic storage and expression in the light of modern molecular biology

Affiliations

Gene and genon concept: coding versus regulation. A conceptual and information-theoretic analysis of genetic storage and expression in the light of modern molecular biology

Klaus Scherrer et al. Theory Biosci. 2007 Oct.

Abstract

We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term "genon". In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Definition of the gene: a functional polypeptide basis of a unit function. By genetic analysis, the gene is identified as a phenotypic function. An individual function is based on co-operating proteins or polypeptides; the latter represent, hence, the basic unit functions. At nucleic acid levels, the closest equivalent is the coding sequence for such a polypeptide, inserted into the mRNA. In the general case, such a coding sequence - gene equivalent - is fragmented in the DNA, which constitutes the genotype, basis of a specific phenotype
Fig. 2
Fig. 2
The Jacob and Monod Model of the operon. In the bacterial operon, several coding sequences (cistrons) are coupled together to secure a metabolic pathway as, e.g., in case of the lac operon. When activated, such an operon is transcribed as a unit and, prior to termination of transcription, a polyribosome is formed on the mRNA, and the products, the enzymes Z and Y as well as an acetylase are made. DNA, mRNA and the translation machinery form, hence, a tightly linked physical complex; therefore (as in a timepiece), arrest at any level stops the entire machinery. In the repressed state, in the upstream operator/promoter sequence where the RNA polymerase attaches and transcription has to start, the repressor may attach on the basis of a sequence-specific protein–DNA interaction, prohibiting transcription. The repressor is the product of a distant gene coding for a polypeptide. Once attached to the DNA, the repressor may become the target of an inducer, in the case cited a small Mr chemical compound reducing the affinity constant of the DNA-repressor interaction. Regulation operates thus primarily at transcriptional level, controlling types and amounts of polypeptides formed; in this case it acts in a negative manner via the repressor, but positive regulation via peptides acting as inducers exists as well. Note that the operon arrangement implies already an expression program including the operator in the sense of the genon concept
Fig. 3
Fig. 3
From Gene to Phen in space and time. Once the unit physical complex of the bacterial translation machinery got disrupted, during the evolution genome-DNA was removed from the polyribosomes and stored away in the nucleus, a time delay results because, prior to gene expression, the transcripts have to be first transported in space. Thus, two inter-dependant vectors in space and time result which, ensemble, govern gene expression. Furthermore, transport of transcripts may be interrupted and considerable time delay may result (up to 30 years, e.g., in case of the human maternal histone mRNA laid down in the unfertilised egg); the corresponding mRNA forms repressed mRNP complexes to be activated upon specific signals, and constitute peripheral memories of genetic information. But transcripts may be stored during earlier stages of their processing from primary pre-mRNA to mRNA; these unspliced or partially spliced pre-mRNAs may still contain individual exons rather than finally constituted coding sequences or genes. The gene, which has to be reconstituted each time an mRNA is formed, springs up, thus, during RNA processing. It is subject to terminal controls which may bear on its nature (final splicing), cellular site and time of expression. Nature, timing and site of gene expression are hence largely subject to post-transcriptional regulation (Scherrer 1980)
Fig. 4
Fig. 4
Genon and Transgenon (box 1) The equivalent of the polypeptide-gene at RNA level is the coding sequence which is inserted in the mRNA and framed by the 5′- and 3′-side UTRs. In the latter and superimposed onto the coding sequence is an ensemble of signals constituting the Genon. The genon represents a program in cis of sequence oligomotifs, eventual binding sites (oligomotifs may form hairpins as shown, or may not) for regulatory proteins (or si/miRNAs - not shown). (Box 2) When present, protein factors interact with the oligomotifs (empty coloured circles) in cis forming RNPs (insert B); the ensemble of the factors (filled circles) picked up by an mRNA constitutes its specific transgenon. (Box 3) The Holo-Transgenon of a given cell is constituted by all these factors, which eventually will recognise an oligomotif in the cis-genon. (Grey box) A subset of factors (filled circles) interacting with a specific mRNA constitute the latter’s transgenon. (Insert A) dark field EM picture of globin mRNA showing its compact non-random nature due to secondary structure. (Insert B) dark field EM picture of a globin mRNP constituted by globin mRNA and 3 times its mass of specific associated proteins (Civelli et al. 1980). Notice, that proteins are attached all along the mRNA chain interacting within the coding sequence. The latter contains, hence, two types of information relating to (1) the genetic code and (2) sequence oligomotifs recognising specific RNA-binding proteins (or interfering RNAs) acting as vehicles of post-transcriptional controls. (For experimental details see Dubochet et al. 1973)
Fig. 5
Fig. 5
From DNA to pre-mRNA and mRNA expression: Proto-, Pre- and Genon The genomic domain (line A) with exons (light green) and fragments of coding sequences (dark green), as well as inter-genic (not shown) and intra-genic non coding DNA, contains instructions for remodelling and activation of chromatin; this constitutes the proto-genon (A’). From these a pre-mRNA (B) or a full domain transcript (FDT) with its pre-genon (B’) may spring off. The latter may contain gene fragments subject to differential splicing; shown is the case of a pre-mRNA containing the two ORFs 1 and 2. Below are shown the two mRNAs created with their respective genons and, thereafter, the two gene equivalents, the coding sequence in mRNAs (1) and (2) with their products, peptide 1 and 2 securing two functions. Insert To the genon signals (oligomotifs) carrying distinct instructions for specific steps of processing and gene expression (left) correspond factors from the transgenon (right), in active or inactive states, which may—or not (when inactive or absent)—implement the corresponding control
Fig. 6
Fig. 6
Transcript size and genomic domainsA The size of giant transcripts (up to 50–100,000 nt) corresponds—by order of magnitude—to the genomic domains observable in specific types of chromosomes B, C, or the “Christmas trees” of primary transcripts observable in the EM after spreading of nucleoli (D1, 2) or non-ribosomal chromatin (D3). (For exp. details see, Scherrer and Darnell and Scherrer et al. , reporting the original observation of “giant” RNA and RNA processing; cf. also Fig. 9 in Scherrer and Marcaud and Fig. 6 in Spohr et al. 1976). BLampbrush chromosomes of Pleurodeles waltl stained for IIF with anti-prosome monoclonal antibodies (for exp. details see Pal et al. 1988). Lampbrush chromosomes are characteristic of the transcription of the entire genome during the diplotene stage of oogenesis in amphibia and birds. Projecting from the chromosome axis are the chromatin loops corresponding to genomic domains, which carry the “Christmas trees” of DNA in maximal transcription (comparable to those shown in panel D3). Prosomes (insert) are protein particles (built of 2 × 14 subunits in 4 superposed rings of 7) found associated to chromatin and (pre-)mRNP complexes; they constitute also the core of the 26S proteasomes (Scherrer and Bey 1994). Notice their association to the loops (maximal at their basis), and also their shedding from the chromosomes into the nucleoplasm. CPolytene chromosomes of Rynchsciara americana in specific stages of larval development and differentiation (cf. Glover et al. ; Lara 1987). Polytene chromosomes represent interphase chromosomes generated by DNA replication without cell division; about 10,000 DNA strands stay associated and form the bands visible in the light microscope due to chromatin hyper-condensation. These physical bands correspond to the meiotic genes in cytogenetics of, e.g. Drosophila, to units of transcription and, in sciaridae, of DNA replication. Notice the development of transcriptional “puffs” at specific stages of differentiation. DTranscription and formation of nucleoli (relation of transcription and nuclear architecture). 1Organised nucleolus with its fibriller centre F where transcription takes place and the granular zone G constituted by already processed ribosomal subunits. 2Hamkelo-Miller spreads of dissociated nucleoli allow to see consecutive ribosomal DNA domains in transcription: the ribosomal transcripts form RNPs, which, eventually, are organised, into the nucleolar dynamic architecture. 3Transcripts of non-ribososomal genomic domains of various sizes
Fig. 7
Fig. 7
Transcription, (pre-)mRNA transport and prosome-specific (PS) nuclear matrix and cytoskeleton. AIn situ hybridisation with a globin riboprobe on transformed avian erythroblasts (AEV cells) showing 3 cells; the lower two are partially (left) and fully (right) induced for hemoglobin production (For exp. details see, Iarovaia et al. 2001). Notice accumulation of globin RNA around the nucleolus (NO) in the un-induced cell, and the presence of 2 nuclear processing centres (PC) and of mRNA in the cytoplasm after induction. BA partially induced AEV cell in situ hybridized with a globin riboprobe (red) as in A, counterstained by IIF with a 23 K-subunit-specific anti-PS monoclonal Ab (23 K p-mAb) serving as a marker for nuclear and cytoplasmic (pre-)mRNPs (green); white dots indicate a 1:1 ratio of the two markers and, hence, co-localisation of globin RNA with the 23 K-type PS (For exp. details see, De Conto et al. 1999). Notice the abundance of globin mRNA-23 K PS complexes at the periphery of the PCs extending to the nuclear membrane, as well as their presence at specific sites in the cytoplasm where repressed globin mRNPs accumulate, whereas the 23 K PS distribute throughout the cytoplasm, similar to globin mRNA in A. C, DNuclear matrix preparations of mouse myoblasts stained with the 23 K-specific p-mAb, prior and after RNase treatment (For exp. details see De Conto et al. 2000). Notice the presence of about 50% of the 23 K PS-mRNP complexes on the nuclear matrix and the appearance, after RNase, of PS-specific networks within the matrix engulfing the nucleoli (black craters). E, FTwo types of Prosome-specific cytoskelettal networks co localising both with cytokeratins (For exp. details see, Olink-Coux et al. 1992). Epithelial cell stained with a p25K-specific E and a p33K-specific p-mAb F. Notice that different networks are occupied by the two types of PS (although both corresponding to the cytokeratin type of IF), as well as the peri-nuclear staining and filamentous links in between cells; in F the PS are on a network starting at the Golgi centre and ending at the plasma membrane on desmosome-like patches
Fig. 8
Fig. 8
The physical supports of gene expression and storage. Not only proteins, but also DNA and RNA are organised in space. In proteins, “spacer” peptides place active sites in precise positions and intra- and intermolecular interactions create the 3D structure necessary for function as enzymes or structural building blocks. DNA and RNA interact with proteins not only for control of gene expression at genon level but secure the also the nuclear constitutive and dynamic architecture: DNPs and pre-RNPs constitute the skeleton of the nuclear matrix. The relatively stable 3D DNA network is modified during differentiation and physiological change. The RNA in processing, as the secondary backbone of the nuclear matrix, permanently controls the dynamic nuclear architecture securing transport of the integrated information of gene and genon. This primary transport system is prolonged into the cytoplasm by the 3 cytoskelettal systems of actin, intermediate filaments and tubulin. Thus, gene fragments are in defined 3D positions where transcripts are generated, migrate to nuclear processing centres and export systems to end up in defined cellular sectors or structures where genes are delivered to the places of their function. All these mechanisms are highly controlled in the 3D space; breakdown of the underlying systems leads to malfunction and pathology as particularly visible in cancer cells which, quite generally, show modifications, and even breakdown of matrix and cytosquelettal organisation
Fig. 9
Fig. 9
The Unified Matrix Hypothesis (Scherrer 1989) postulates the existence of a 3D network of Chromatin primed by intrinsic properties of the genomic DNA. This constitutes a third type of genetic information based essentially on the distance of sites where two DNA strands interact, at distant sites on the same and/or on different chromosomes; mere DNA length becomes a genetic information. A, BThe network of Ectopic Pairing shows the existence of such a 3D chromatin system, as observed for the 4 polytene chromosomes in Drosophila salivary gland cells A which are genuine interphase cells (micrograph courtesy V. E. Barsky; cf. Ananiev et al. 1981). Notice intra- and inter-chromosomal as well as telomeric links. The cables suspend the nucleolus in a fixed position; since it contains the highly amplified genomic domains for ribosomal RNA, notice that the DNA must pass through some of these cables. BThe position of these cables linking interbands is genetically fixed (Kaufman et al. 1948). C, DThe formation of the matrix network The DNA in normal interphase cells being flexible, it may directly interact at specific sites (A1–An in C) within and in between the chromosomes, eventually forming a 3D network D of euchromatic chromatin and, secondarily, the matrix protein network (dashed lines) binding to the matrix attachment regions (MARs; small dots). Condensed heterochromatin (fat dots) can not participate to this system; the DNA network is modified mainly during differentiation by conversion of hetero- and euchromatin and by epigenetic modifications. E, FCorrelations of UMH and the Chromosome Field theory (Lima de Faria 1979). Aligning (by increasing length) chromosome arms (centromers vertical to the left, telomeres right on a borderline at 45° angle) carrying the ribosomal DNA of same and neighbouring species, it appears that the rDNA is always at an identical chromosome position relative to centromer and telomere E. The nucleolus being in a fixed position in the ectopic network (see A), this fact might be explained according to the UMH F: a specific position in space would imply a specific position along the DNA and, as the result, in the derived 3D network
Fig. 10
Fig. 10
The Cascade of Regulation (Scherrer , ; Scherrer and Marcaud 1968): the information content of the zygotic genome is gradually reduced to that expressed in a differentiated cell. In Homo sapiens, information for an estimated 500,000 polypeptide-genes are reduced to a few hundred in gradual steps; as few as 3 genes may account for up to 90% protein output, as is the case in red blood cells (Imaizumi-Scherrer et al. 1982). The Holo-Cascade (not shown) includes additional steps, leading upstream from the information content of an entire species to that of populations and individuals, and downstream from the polypeptide to the assembled, functional protein including all post-translational modifications (Scherrer 1980). Under the direction of holo-Genon and holo-Transgenon, the DNA reduces the genomic information by DNA rearrangements to that of an individual cell, and then by individual steps of processing to that necessary for the expression of an individual function, as shown here and outlined in the text. These may include: (1, 2) chromatin modification and activation (proto-genon-dependant); (3) transcription and formation of pre-mRNP (pre-genon); (4–6) gradual processing and splicing (pre-genon); (7) export and formation of cytoplasmic mRNP (genon); (8, 9) activation (de-repression) of mRNP (genon); (10) translation of mRNA (genon) followed by peptide formation (genon has expired)
Fig. 11
Fig. 11
Endo- and Exo-cascade. The information guiding gene expression stems not only from the genome but also from the outside of cell and organism. Genon and transgenon are directly or indirectly modified by input from the Exo-system (for organisms, possibly, the ecosystem). AInformation Processing. From the DNA to the individual gene and phenotype, the genomic information decreases, eliminated by selection of domains and RNA processing. Concomitantly, external input is integrated into the expression process, guiding selection, specific processing and activation of specific genons and mRNA, mainly via the holo-transgenon, composed of factors encoded either by the genome or else imported from the outside of cell and organisms. B Within the cell, the genomic cascade of regulation (Endo-cascade) is infiltrated by the information from outside cell and organism (Exo-cascade). This input is highest at the periphery of the cellular systems: the organism, the cellular membrane, the mRNA-genon, but may reach the pre-genons, as well as the genomic DNA, as detailed in C

Comment in

Similar articles

Cited by

References

    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'PubMed', 'value': '17213034', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/17213034/'}]}
    2. Albiez H, Cremer M, Tiberi C, Vecchio L, Schermelleh L, Dittrich S, Kupper K, Joffe B, Thormeyer T, von Hase J et al (2006) Rise, fall and resurrection of chromosome territories: a historical perspective. Part II. Fall and resurrection of chromosome territories during the 1950s to 1980s. Part III. Chromosome territories and the functional nuclear architecture: experiments and models from the 1990s to the present. Eur J Histochem 50:223–272. Review - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'PubMed', 'value': '6790245', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/6790245/'}]}
    2. Ananiev E, Barsky V, Ilyin Y, Churikov N (1981) Localization of nucleoli in Drosophila melanogaster polytene chromosomes. Chromosoma 81:619–628 - PubMed
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'PMC', 'value': 'PMC59777', 'is_inner': False, 'url': 'https://pmc.ncbi.nlm.nih.gov/articles/PMC59777/'}, {'type': 'PubMed', 'value': '11593024', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/11593024/'}]}
    2. Anguita E, Johnson CA, Wood WG, Turner BM, Higgs DR (2001) Identification of a conserved erythroid specific domain of histone acetylation across the alpha-globin gene cluster. Proc Natl Acad Sci USA 98:12114–12119 - PMC - PubMed
    1. None
    2. Ashby WR (1956) An introduction to cybernetics. Chapman and Hall, London
    1. {'text': '', 'ref_index': 1, 'ids': [{'type': 'PubMed', 'value': '9216087', 'is_inner': True, 'url': 'https://pubmed.ncbi.nlm.nih.gov/9216087/'}]}
    2. Arcangeletti C, Sütterlin R, Aebi U, De Conto F, Missorini S, Chezzi C, Scherrer K (1997) Visualization of prosomes (MCP-proteasomes), intermediate filament and actin networks by “instantaneous fixation” preserving the cytoskeleton. J Struct Biol 119:35–58 - PubMed

Publication types