1
|
Frazer SA, Baghbanzadeh M, Rahnavard A, Crandall KA, Oakley TH. Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD). Gigascience 2024; 13:giae073. [PMID: 39460934 PMCID: PMC11512451 DOI: 10.1093/gigascience/giae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 06/25/2024] [Accepted: 09/01/2024] [Indexed: 10/28/2024] Open
Abstract
BACKGROUND Predicting phenotypes from genetic variation is foundational for fields as diverse as bioengineering and global change biology, highlighting the importance of efficient methods to predict gene functions. Linking genetic changes to phenotypic changes has been a goal of decades of experimental work, especially for some model gene families, including light-sensitive opsin proteins. Opsins can be expressed in vitro to measure light absorption parameters, including λmax-the wavelength of maximum absorbance-which strongly affects organismal phenotypes like color vision. Despite extensive research on opsins, the data remain dispersed, uncompiled, and often challenging to access, thereby precluding systematic and comprehensive analyses of the intricate relationships between genotype and phenotype. RESULTS Here, we report a newly compiled database of all heterologously expressed opsin genes with λmax phenotypes that we call the Visual Physiology Opsin Database (VPOD). VPOD_1.0 contains 864 unique opsin genotypes and corresponding λmax phenotypes collected across all animals from 73 separate publications. We use VPOD data and deepBreaks to show regression-based machine learning (ML) models often reliably predict λmax, account for nonadditive effects of mutations on function, and identify functionally critical amino acid sites. CONCLUSION The ability to reliably predict functions from gene sequences alone using ML will allow robust exploration of molecular-evolutionary patterns governing phenotype, will inform functional and evolutionary connections to an organism's ecological niche, and may be used more broadly for de novo protein design. Together, our database, phenotype predictions, and model comparisons lay the groundwork for future research applicable to families of genes with quantifiable and comparable phenotypes.
Collapse
Affiliation(s)
- Seth A Frazer
- Ecology, Evolution, and Marine Biology, University of California, Santa Barbara, California 93106, USA
| | - Mahdi Baghbanzadeh
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052, USA
| | - Ali Rahnavard
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052, USA
| | - Keith A Crandall
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052, USA
- Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20012, USA
| | - Todd H Oakley
- Ecology, Evolution, and Marine Biology, University of California, Santa Barbara, California 93106, USA
| |
Collapse
|
2
|
Smedley GD, McElroy KE, Feller KD, Serb JM. Additive and epistatic effects influence spectral tuning in molluscan retinochrome opsin. J Exp Biol 2022; 225:275511. [DOI: 10.1242/jeb.242929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 04/26/2022] [Indexed: 11/20/2022]
Abstract
The relationship between genotype and phenotype is nontrivial due to often complex molecular pathways that make it difficult to unambiguously relate phenotypes to specific genotypes. Photopigments, an opsin apoprotein bound to a light-absorbing chromophore, present an opportunity to directly relate the amino acid sequence to an absorbance peak phenotype (λmax). We examined this relationship by conducting a series of site-directed mutagenesis experiments of retinochrome, a non-visual opsin, from two closely related species: the common bay scallop, Argopecten irradians, and the king scallop, Pecten maximus. Using protein folding models, we identified three amino acid sites of likely functional importance and expressed mutated retinochrome proteins in vitro. Our results show that the mutation of amino acids lining the opsin binding pocket are responsible for fine spectral tuning, or small changes in the λmax of these light sensitive proteins Mutations resulted in a blue or red shift as predicted, but with dissimilar magnitudes. Shifts ranged from a 16 nm blue shift to a 12 nm red shift from the wild-type λmax. These mutations do not show an additive effect, but rather suggests the presence of epistatic interactions. This work highlights the importance of binding pocket shape in the evolution of spectral tuning and builds on our ability to relate genotypic changes to phenotypes in an emerging model for opsin functional analysis.
Collapse
Affiliation(s)
- G. Dalton Smedley
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, USA
| | - Kyle E. McElroy
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, USA
| | - Kathryn D. Feller
- Department of Biological Sciences, Union College, Schenectady, New York, USA
| | - Jeanne M. Serb
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
3
|
Hensley NM, Ellis EA, Leung NY, Coupart J, Mikhailovsky A, Taketa DA, Tessler M, Gruber DF, De Tomaso AW, Mitani Y, Rivers TJ, Gerrish GA, Torres E, Oakley TH. Selection, drift, and constraint in cypridinid luciferases and the diversification of bioluminescent signals in sea fireflies. Mol Ecol 2021; 30:1864-1879. [PMID: 33031624 PMCID: PMC11629831 DOI: 10.1111/mec.15673] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 09/09/2020] [Accepted: 09/18/2020] [Indexed: 02/07/2023]
Abstract
Understanding the genetic causes of evolutionary diversification is challenging because differences across species are complex, often involving many genes. However, cases where single or few genetic loci affect a trait that varies dramatically across a radiation of species provide tractable opportunities to understand the genetics of diversification. Here, we begin to explore how diversification of bioluminescent signals across species of cypridinid ostracods ("sea fireflies") was influenced by evolution of a single gene, cypridinid-luciferase. In addition to emission spectra ("colour") of bioluminescence from 21 cypridinid species, we report 13 new c-luciferase genes from de novo transcriptomes, including in vitro assays to confirm function of four of those genes. Our comparative analyses suggest some amino acid sites in c-luciferase evolved under episodic diversifying selection and may be associated with changes in both enzyme kinetics and colour, two enzymatic functions that directly impact the phenotype of bioluminescent signals. The analyses also suggest multiple other amino acid positions in c-luciferase evolved neutrally or under purifying selection, and may have impacted the variation of colour of bioluminescent signals across genera. Previous mutagenesis studies at candidate sites show epistatic interactions, which could constrain the evolution of c-luciferase function. This work provides important steps toward understanding the genetic basis of diversification of behavioural signals across multiple species, suggesting different evolutionary processes act at different times during a radiation of species. These results set the stage for additional mutagenesis studies that could explicitly link selection, drift, and constraint to the evolution of phenotypic diversification.
Collapse
Affiliation(s)
- Nicholai M. Hensley
- Department of Ecology, Evolution, & Marine Biology, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Emily A. Ellis
- Department of Ecology, Evolution, & Marine Biology, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Nicole Y. Leung
- Neuroscience Research Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
- Department of Molecular, Cellular and Developmental Biology, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - John Coupart
- Department of Ecology, Evolution, & Marine Biology, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Alexander Mikhailovsky
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Daryl A. Taketa
- Department of Molecular, Cellular and Developmental Biology, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Michael Tessler
- American Museum of Natural History and New York University, New York, NY, USA
- Department of Biology, St. Francis College, Brooklyn, NY, USA
| | - David F. Gruber
- Department of Biology and Environmental Science, City University of New York Baruch College, New York, NY, USA
| | - Anthony W. De Tomaso
- Department of Molecular, Cellular and Developmental Biology, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Yasuo Mitani
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Sapporo, Japan
| | - Trevor J. Rivers
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA
| | - Gretchen A. Gerrish
- Department of Biology, University of Wisconsin – La Crosse, La Crosse, WI, USA
| | - Elizabeth Torres
- Department of Biological Sciences, California State University, Los Angeles, Los Angeles, CA, USA
| | - Todd H. Oakley
- Department of Ecology, Evolution, & Marine Biology, University of California, Santa Barbara, Santa Barbara, CA, USA
| |
Collapse
|
4
|
DeLeo DM, Bracken-Grissom HD. Lighting the way: Forces driving the diversification of bioluminescent signalling in sea fireflies. Mol Ecol 2021; 30:1747-1750. [PMID: 33709451 DOI: 10.1111/mec.15880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 03/03/2021] [Indexed: 11/29/2022]
Abstract
Understanding the drivers of diversification and processes that maintain biodiversity remains a central theme of evolutionary biology. However, these efforts are often impeded due to disparities across species and environments and the genetic complexity underlying many traits. The factors driving biodiversity can be more readily understood by focusing on the genetics of diversification, of one or few genes shared across species, with large influence over an organism's phenotype (Templeton, 1981; Wright, 1984). In this pursuit, previous studies often focus on the selective pressures that impact phenotypic diversity (Brawand et al., 2014; Yokoyama et al., 2015), often overlooking the contribution of neutral processes (i.e., genetic drift). In this issue of Molecular Ecology, Hensley et al. (2020) use an integrative approach, including RNA sequencing, in vitro protein expression and spectral measurements, to explore the drivers behind the diversification of bioluminescent signalling in cypridinid ostracods (Figure 1). Typical bioluminescent reactions primarily include an enzyme (luciferase) and substrate (luciferin). By focusing on a single gene, this study traces the molecular evolution of (c)luciferase in sea fireflies, elucidating diverse signatures of selection, drift and constraint to decipher the link between genotype and phenotype of their bioluminescent emissions.
Collapse
Affiliation(s)
- Danielle M DeLeo
- Department of Invertebrate Zoology, Smithsonian National Museum of Natural History, Washington, DC, USA
| | - Heather D Bracken-Grissom
- Department of Biological Sciences, Institute of Environment, Florida International University, North Miami, FL, USA
| |
Collapse
|
5
|
Chen J, Wong KC. Analyzing High-Order Epistasis from Genotype-Phenotype Maps Using 'Epistasis' Package. Methods Mol Biol 2021; 2212:265-275. [PMID: 33733361 DOI: 10.1007/978-1-0716-0947-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Epistasis is the phenomenon about the interactions between genes, leading to complex phenotypic effects. The interactions between three or more mutations called "high-order epistasis" aroused significant interests in recent studies. However, there are still debates for analysis of high-order epistasis due to the non-linear model complexity and statistical artifacts. A recent "epistasis" Python package was therefore developed to characterize high-order epistasis by estimating non-linear scaling for mutation effects to extract high-order epistasis using linear models. This method successfully discovered statistically significant high-order epistasis on several real genotype-phenotype maps. We provided a concise and step-by-step guide to apply the "epistasis" by reproducing the high-order epistasis discoveries on real genotype-phenotype data using the latest API of the package.
Collapse
Affiliation(s)
- Junyi Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong.
| |
Collapse
|
6
|
Yokoyama S, Jia H. Origin and adaptation of green-sensitive (RH2) pigments in vertebrates. FEBS Open Bio 2020; 10:873-882. [PMID: 32189477 PMCID: PMC7193153 DOI: 10.1002/2211-5463.12843] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2020] [Revised: 02/20/2020] [Accepted: 03/16/2020] [Indexed: 12/12/2022] Open
Abstract
One of the critical times for the survival of animals is twilight where the most abundant visible lights are between 400 and 550 nanometres (nm). Green-sensitive RH2 pigments help nonmammalian vertebrate species to better discriminate wavelengths in this blue-green region. Here, evaluation of the wavelengths of maximal absorption (λmax s) of genetically engineered RH2 pigments representing 13 critical stages of vertebrate evolution revealed that the RH2 pigment of the most recent common ancestor of vertebrates had a λmax of 503 nm, while the 12 ancestral pigments exhibited an expanded range in λmax s between 474 and 524 nm, and present-day RH2 pigments have further expanded the range to ~ 450-530 nm. During vertebrate evolution, eight out of the 16 significant λmax shifts (or |Δλmax | ≥ 10 nm) of RH2 pigments identified were fully explained by the repeated mutations E122Q (twice), Q122E (thrice) and M207L (twice), and A292S (once). Our data indicated that the highly variable λmax s of teleost RH2 pigments arose from gene duplications followed by accelerated amino acid substitution.
Collapse
Affiliation(s)
- Shozo Yokoyama
- Department of BiologyEmory UniversityAtlantaGAUSA
- Willamette ViewPortlandORUSA
| | - Huiyong Jia
- Department of BiologyEmory UniversityAtlantaGAUSA
| |
Collapse
|
7
|
Sailer ZR, Harms MJ. High-order epistasis shapes evolutionary trajectories. PLoS Comput Biol 2017; 13:e1005541. [PMID: 28505183 PMCID: PMC5448810 DOI: 10.1371/journal.pcbi.1005541] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Revised: 05/30/2017] [Accepted: 04/24/2017] [Indexed: 01/02/2023] Open
Abstract
High-order epistasis—where the effect of a mutation is determined by interactions with two or more other mutations—makes small, but detectable, contributions to genotype-fitness maps. While epistasis between pairs of mutations is known to be an important determinant of evolutionary trajectories, the evolutionary consequences of high-order epistasis remain poorly understood. To determine the effect of high-order epistasis on evolutionary trajectories, we computationally removed high-order epistasis from experimental genotype-fitness maps containing all binary combinations of five mutations. We then compared trajectories through maps both with and without high-order epistasis. We found that high-order epistasis strongly shapes the accessibility and probability of evolutionary trajectories. A closer analysis revealed that the magnitude of epistasis, not its order, predicts is effects on evolutionary trajectories. We further find that high-order epistasis makes it impossible to predict evolutionary trajectories from the individual and paired effects of mutations. We therefore conclude that high-order epistasis profoundly shapes evolutionary trajectories through genotype-fitness maps. A key goal for evolutionary biologists is understanding why one evolutionary trajectory is taken rather than others. This requires understanding how individual mutations, as well as interactions between them, determine the accessibility of evolutionary pathways. We used a robust statistical analysis to reveal interactions between up to five mutations in published datasets, meaning that the effect of a mutation can depend on the presence or absence of four other mutations. Simulations reveal that these interactions strongly shape evolutionary trajectories. These interactions lead to profound unpredictability in evolution, as one cannot use the effect of a mutation in the ancestor to predict its effect later in the trajectory.
Collapse
Affiliation(s)
- Zachary R. Sailer
- Institute of Molecular Biology, University of Oregon, Eugene, OR, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, USA
| | - Michael J. Harms
- Institute of Molecular Biology, University of Oregon, Eugene, OR, USA
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, USA
- * E-mail:
| |
Collapse
|
8
|
Detecting High-Order Epistasis in Nonlinear Genotype-Phenotype Maps. Genetics 2017; 205:1079-1088. [PMID: 28100592 PMCID: PMC5340324 DOI: 10.1534/genetics.116.195214] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 01/09/2017] [Indexed: 11/18/2022] Open
Abstract
High-order epistasis has been observed in many genotype-phenotype maps. These multi-way interactions between mutations may be useful for dissecting complex traits and could have profound implications for evolution. Alternatively, they could be a statistical artifact. High-order epistasis models assume the effects of mutations should add, when they could in fact multiply or combine in some other nonlinear way. A mismatch in the “scale” of the epistasis model and the scale of the underlying map would lead to spurious epistasis. In this article, we develop an approach to estimate the nonlinear scales of arbitrary genotype-phenotype maps. We can then linearize these maps and extract high-order epistasis. We investigated seven experimental genotype-phenotype maps for which high-order epistasis had been reported previously. We find that five of the seven maps exhibited nonlinear scales. Interestingly, even after accounting for nonlinearity, we found statistically significant high-order epistasis in all seven maps. The contributions of high-order epistasis to the total variation ranged from 2.2 to 31.0%, with an average across maps of 12.7%. Our results provide strong evidence for extensive high-order epistasis, even after nonlinear scale is taken into account. Further, we describe a simple method to estimate and account for nonlinearity in genotype-phenotype maps.
Collapse
|
9
|
Yokoyama S, Tada T, Liu Y, Faggionato D, Altun A. A simple method for studying the molecular mechanisms of ultraviolet and violet reception in vertebrates. BMC Evol Biol 2016; 16:64. [PMID: 27001075 PMCID: PMC4802639 DOI: 10.1186/s12862-016-0637-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Accepted: 03/16/2016] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Many vertebrate species use ultraviolet (UV) reception for such basic behaviors as foraging and mating, but many others switched to violet reception and improved their visual resolution. The respective phenotypes are regulated by the short wavelength-sensitive (SWS1) pigments that absorb light maximally (λmax) at ~360 and 395-440 nm. Because of strong epistatic interactions, the biological significance of the extensive mutagenesis results on the molecular basis of spectral tuning in SWS1 pigments and the mechanisms of their phenotypic adaptations remains uncertain. RESULTS The magnitudes of the λmax-shifts caused by mutations in a present-day SWS1 pigment and by the corresponding forward mutations in its ancestral pigment are often dramatically different. To resolve these mutagenesis results, the A/B ratio, in which A and B are the areas formed by amino acids at sites 90, 113 and 118 and by those at sites 86, 90 and 118 and 295, respectively, becomes indispensable. Then, all critical mutations that generated the λmax of a SWS1 pigment can be identified by establishing that 1) the difference between the λmax of the ancestral pigment with these mutations and that of the present-day pigment is small (3 ~ 5 nm, depending on the entire λmax-shift) and 2) the difference between the corresponding A/B ratios is < 0.002. CONCLUSION Molecular adaptation has been studied mostly by using comparative sequence analyses. These statistical results provide biological hypotheses and need to be tested using experimental means. This is an opportune time to explore the currently available and new genetic systems and test these statistical hypotheses. Evaluating the λmaxs and A/B ratios of mutagenized present-day and their ancestral pigments, we now have a method to identify all critical mutations that are responsible for phenotypic adaptation of SWS1 pigments. The result also explains spectral tuning of the same pigments, a central unanswered question in phototransduction.
Collapse
Affiliation(s)
- Shozo Yokoyama
- Department of Biology, Emory University, Atlanta, GA, 30322, USA.
| | - Takashi Tada
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Yang Liu
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | | | - Ahmet Altun
- Department of Physics, Fatih University, Istanbul, 34500, Turkey.,Department of Genetics and Bioengineering, Fatih University, Istanbul, 34500, Turkey
| |
Collapse
|