When one lines up the genomes of related species, they vary in more than the protein-coding sequences. In fact, most differences fall outside these regions, given the small fraction they represent. Genomes differ with respect to sequences that regulate gene expression or code for structures other than proteins. Chromatin modifications such as methylation and accessibility contribute to the observed dissimilarities and are also subject to selection. The output of these processes can also differ among species, as seen in the divergence of gene expression and protein levels. Differences and conservation between species result in the development of distinct phenotypes and are also relevant to the development of diseases.
Regulatory evolution considers the modifications of gene activity rather than structure. Compared to evolution by structural mutations, which can have wide-ranging effects, changes in gene activity often involve specific traits with a minimal fitness cost. Novelty can arise from the recruitment of existing components coopted to regulate gene expression. In principle, any existing regulatory protein can be made to control the expression of another to generate novel phenotypes.
Gene expression adaptation
Two main components control gene expression. The first is a nearby sequence where regulators bind to turn on or off the transcription of the DNA code. The second is the sequence that encodes the regulators themselves. These sequences can be far from the gene, but they produce their effect through a product that can travel and bind to the target. If selection acts on the variants in either component, they could generate novelty.
The type of selection acting on gene expression variations is subject to debate. Many argue that the observed disparities in gene expression between species are essentially random and are subject only to negative and neutral selection. Although this might be the case in many instances, changes in gene expression can be adaptive. The challenge here is to show the adaptive changes among the numerous variants associated with gene expression. Another difficulty arises from the fact that gene expression also differs between individuals and tissues of the same subject. Finally, gene expression dynamically changes with development and differentiation.
So far, I have shown differences between species that are not due to differences in the protein-coding sequences. These differences can be subject to selection, and some can even be adaptive. These adaptations are relevant to the divergence among species and a particular selection type that occurs between cancer cells. Much of the work on gene expression adaptation has been done in yeast and differentiating stem cells. Its application to evolving diseases such as cancer has also proved valuable.
A small proportion of the mammalian genomes code for proteins. However, transcription is not exclusive to this tiny fraction of the genome. Other classes of RNA are produced and do not code for proteins. Of particular importance are the so-called non-coding RNAs and pseudogenes. The latter are sequences similar to existing genes that arose by duplication or retrotransposition but lost translation ability. Both elements regulate gene expression in different ways. For example, short stretches of RNAs known as microRNAs bind with the intermediary RNA of coding genes and cause their degradation. Pseudogenes, because they bear a resemblance to existing genes, can act as sponges for the gene regulators, causing activation or repression of the original functioning copy of the gene. From here, it is a short step to imagine if the targets of these non-coding RNAs are involved in cancer, how selection for or against them plays a role in cancer development.
Epigenetics describes the changes in the genome that are not related to its sequence, although mutations can drive them. These elements regulate the activity of the genes, and their disruption often results in disease. Epigenetic modifiers are genes whose products directly modify the DNA in multiple ways and drive cancer development with few or little mutations in their targets. Alterations in these gene products result in the targets' altered methylation and chromatin structure.
Epigenetic mediators are often not mutated but are the target of disruption. These disruptions generate stem-like behavior in tumor cells. Another class of related genes is epigenetic modulators. They contribute to cancer development by responding to environmental stressors and modulating the cell behavior through the other two types of genes.
Bring it all together.
The term epigenetics is a mouthful. Biologists use it as an all-encompassing term to refer to any changes outside the coding sequence that regulates the gene activity. I have presented gene expression adaptation and non-coding RNAs as distinct and separate from epigenetic modifications, but they are part and result of it. That is, non-coding RNAs and other epigenetic modifications work by changing the levels of expression of genes which can be adaptive. In particular, they can be adaptive to cancer cells which, under selective pressure, proliferate and come to dominate the tumor mass.
The dominant paradigm in developing cancer therapy is to target a protein involved in a significant fraction of patients who develop the disease. In contrast, the sort of regulators I describe here works fundamentally differently, which could explain why drugs have not targeted them for treatment. The effect size of any of these regulatory variants is small. Only the cumulative effects of many regulators result in a measurable change in gene expression and cell behavior. Moreover, these regulators are cell, tissue, individual, and species-specific. Therefore they are hard to identify across different forms of particular cancer, across cancer types, or multiple patients. Finally, regulatory variants spread across the genome, and the search space is orders of magnitude larger than that of coding sequences. The data and the tools to probe this vast space are only years old but growing fast. These facts continue to challenge researchers whose ultimate goal is to understand the basic machinery of the cell, alterations that result in disease, and develop remedies for those that turn cancerous.
- Kelley, J. L., & Gilad, Y. (2020). Effective study design for comparative functional genomics. Nature Reviews Genetics, 21(7).
- Blake, L. E., Roux, J., Hernando-Herraez, I., Banovich, N. E., Perez, R. G., Hsiao, C. J., Eres, I., Cuevas, C., Marques-Bonet, T., & Gilad, Y. (2020). A comparison of gene expression and DNA methylation patterns across tissues and species. Genome Research, 30(2).
- Prud’homme, B., Gompel, N., & Carroll, S. B. (2007). Emerging principles of regulatory evolution.
- Ward, M., & Gilad, Y. (2017). Human genomics: Cracking the regulatory code. Nature, 550.
- Fraser, H. B., Moses, A. M., & Schadt, E. E. (2010). Evidence for widespread adaptive evolution of gene expression in budding yeast. Proceedings of the National Academy of Sciences of the United States of America, 107(7), 2977–2982.
- Fraser, H. B. (2011). Genome-wide approaches to the study of adaptive gene expression evolution: Systematic studies of evolutionary adaptations involving gene expression will allow many fundamental questions in evolutionary biology to be addressed. BioEssays, 33(6), 469–477.
- Feinberg, A. P., Koldobskiy, M. A., & Göndör, A. (2016). Epigenetic modulators, modifiers and mediators in cancer aetiology and progression. Nature Reviews Genetics, 17(5).
- Cuykendall, T. N., Rubin, M. A., & Khurana, E. (2017). Non-coding genetic variation in cancer. Current Opinion in Systems Biology, 1.
- Khurana, E., Fu, Y., Chakravarty, D., Demichelis, F., Rubin, M. A., & Gerstein, M. (2016). Role of non-coding sequence variants in cancer. Nature Reviews Genetics, 17(2).