1
|
Fallahpour A, Gureghian V, Filion GJ, Lindner AB, Pandi A. CodonTransformer: a multispecies codon optimizer using context-aware neural networks. Nat Commun 2025; 16:3205. [PMID: 40180930 PMCID: PMC11968976 DOI: 10.1038/s41467-025-58588-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 03/24/2025] [Indexed: 04/05/2025] Open
Abstract
Degeneracy in the genetic code allows many possible DNA sequences to encode the same protein. Optimizing codon usage within a sequence to meet organism-specific preferences faces combinatorial explosion. Nevertheless, natural sequences optimized through evolution provide a rich source of data for machine learning algorithms to explore the underlying rules. Here, we introduce CodonTransformer, a multispecies deep learning model trained on over 1 million DNA-protein pairs from 164 organisms spanning all domains of life. The model demonstrates context-awareness thanks to its Transformers architecture and to our sequence representation strategy that combines organism, amino acid, and codon encodings. CodonTransformer generates host-specific DNA sequences with natural-like codon distribution profiles and with minimum negative cis-regulatory elements. This work introduces the strategy of Shared Token Representation and Encoding with Aligned Multi-masking (STREAM) and provides a codon optimization framework with a customizable open-access model and a user-friendly Google Colab interface.
Collapse
Affiliation(s)
- Adibvafa Fallahpour
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- University of Toronto Scarborough; Department of Biological Science, Scarborough, ON, Canada
| | - Vincent Gureghian
- Sorbonne Université, CNRS, ERL U1338 Inserm, Department of Computational, Quantitative and Synthetic Biology, Paris, France
- Sorbonne Université, CNRS, Inserm, Institut de Biologie Paris-Seine, Paris, France
| | - Guillaume J Filion
- University of Toronto Scarborough; Department of Biological Science, Scarborough, ON, Canada.
| | - Ariel B Lindner
- Sorbonne Université, CNRS, ERL U1338 Inserm, Department of Computational, Quantitative and Synthetic Biology, Paris, France.
- Sorbonne Université, CNRS, Inserm, Institut de Biologie Paris-Seine, Paris, France.
- Sorbonne Université, CNRS, Université de Technologie de Compiègne, Inserm, Biofoundry Alliance Sorbonne Université, Paris, France.
| | - Amir Pandi
- Sorbonne Université, CNRS, ERL U1338 Inserm, Department of Computational, Quantitative and Synthetic Biology, Paris, France.
- Sorbonne Université, CNRS, Inserm, Institut de Biologie Paris-Seine, Paris, France.
- Sorbonne Université, CNRS, Université de Technologie de Compiègne, Inserm, Biofoundry Alliance Sorbonne Université, Paris, France.
| |
Collapse
|
2
|
Kitsis RN, Leinwand LA. Discordance between gene regulation in vitro and in vivo. Gene Expr 2018; 2:313-8. [PMID: 1472867 PMCID: PMC6057365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Affiliation(s)
- R N Kitsis
- Department of Medicine, Albert Einstein College of Medicine, Bronx, New York 10461
| | | |
Collapse
|
3
|
Tardiff J, Krauter KS. Divergent expression of alpha1-protease inhibitor genes in mouse and human. Nucleic Acids Res 1998; 26:3794-9. [PMID: 9685498 PMCID: PMC147770 DOI: 10.1093/nar/26.16.3794] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The alpha1-protease inhibitor proteins of laboratory mice are homologous in sequence and function to human alpha1-antitrypsin and are encoded by a highly conserved multigene family comprised of five members. In humans, the inhibitor is expressed in liver and in macrophages and decreased expression or inhibitory activity is associated with a deficiency syndrome which can result in emphysema and liver disease in affected individuals. It has been proposed that macrophage expression may be an important component of the function of human alpha1-antitrypsin. Clearly, it is desirable to develop a mouse model of this deficiency syndrome, however, efforts to do this have been largely unsuccessful. In this paper, we report that aside from the issues of potentially redundant gene function, the mouse may not be a suitable animal for such studies, because there is no significant expression of murine alpha1-protease inhibitor in the macrophages of mice. This difference between the species appears to result from an absence of a functional macrophage-specific promoter in mice.
Collapse
Affiliation(s)
- J Tardiff
- Department of Molecular, Cellular and Developmental Biology, University of Colorado at Boulder, Campus Box 347, Boulder, CO 80309, USA
| | | |
Collapse
|
4
|
Bulla GA. Hepatocyte nuclear factor-4 prevents silencing of hepatocyte nuclear factor-1 expression in hepatoma x fibroblast cell hybrids. Nucleic Acids Res 1997; 25:2501-8. [PMID: 9171105 PMCID: PMC146744 DOI: 10.1093/nar/25.12.2501] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Hepatocyte nuclear factors-1alpha (HNF1alpha) and -4 (HNF4) are components of a liver-enriched transcription activation pathway which is thought to play a critical role in hepatocyte-specific gene expression, including activation of alpha1-antitrypsin gene expression. HNF1alpha, HNF4 and alpha1-antitrypsin (alpha1AT) genes are extinguished in hepatoma/fibroblast somatic cell hybrids, suggesting that fibroblasts contain a repressor-like activity. To determine the molecular basis for silencing of these genes in cell hybrids, ectopic expression of HNF1alpha and HNF4 was used. Results show that constitutive expression of HNF4 prevents extinction of HNF1alpha gene expression in hepatoma/fibroblast hybrids. In contrast, forced HNF1alpha expression failed to prevent extinction of the HNF4 locus in cell hybrids. Likewise, the alpha1AT gene remained silent in the presence of both HNF1alpha and HNF4. These results suggest that extinction of HNF1alpha is a simple lack-of-activation phenotype, whereas extinction of HNF4 andalpha1AT loci is more complex, perhaps involving negative regulation.
Collapse
Affiliation(s)
- G A Bulla
- Pediatric Research Institute, St Louis University Health Sciences Center and Cardinal Glennon Children's Hospital, 3662 Park Avenue, St Louis, MO 63110, USA.
| |
Collapse
|
5
|
Ray A, Gao X, Ray BK. Role of a distal enhancer containing a functional NF-kappa B-binding site in lipopolysaccharide-induced expression of a novel alpha 1-antitrypsin gene. J Biol Chem 1995; 270:29201-8. [PMID: 7493948 DOI: 10.1074/jbc.270.49.29201] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
alpha 1-Antitrypsin (alpha 1-AT) is one of the major proteinase inhibitors in serum. Its primary physiological function is to inhibit neutrophil elastase activity in lung, but it also inhibits other serine proteases including trypsin, chymotrypsin, thrombin, and cathepsin. We have previously reported a novel alpha 1-AT, S-2 isoform, from rabbit that is induced up to 100-fold in the liver during acute inflammatory condition (Ray, B. K., Gao, X., and Ray, A. (1994) J. Biol. Chem. 269, 22080-22086). Here, we present evidence that the expression of this alpha 1-AT S-2 gene is also induced in lipopolysaccharide (LPS)-treated peripheral blood monocytes. From the cloned genomic DNA, we have identified a distal LPS-responsive enhancer located between -2438 and -1990 base pairs upstream of the transcription start site. In vitro DNA-binding studies demonstrated an interaction of an LPS-inducible NF-kappa B-like nuclear factor with a kappa B-element present in this enhancer region. Antibodies against p65 and p50 subunits of NF-kappa B supershifted the DNA-protein complex. A mutation of the NF-kappa B-binding element virtually abolished the LPS-responsive induction of the chimeric promoter in monocytic cells. Furthermore, overexpression of NF-kappa B induced the wild-type promoter activity. Taken together, these results demonstrated that during LPS-mediated inflammation, NF-kappa B/Rel family of transcription factors play a crucial role in the transcriptional induction of the inflammation responsive alpha 1-AT gene.
Collapse
Affiliation(s)
- A Ray
- Department of Veterinary Pathobiology, University of Missouri, Columbia 65211, USA
| | | | | |
Collapse
|
6
|
Kass-Eisler A, Li K, Leinwand LA. Prospects for gene therapy with direct injection of polynucleotides. Ann N Y Acad Sci 1995; 772:232-40. [PMID: 8546398 DOI: 10.1111/j.1749-6632.1995.tb44749.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- A Kass-Eisler
- Department of Microbiology & Immunology, Albert Einstein College of Medicine, Bronx, New York 10461, USA
| | | | | |
Collapse
|
7
|
Amicone L, Galimi MA, Spagnoli FM, Tommasini C, De Luca V, Tripodi M. Temporal and tissue-specific expression of the MET ORF driven by the complete transcriptional unit of human A1AT gene in transgenic mice. Gene X 1995; 162:323-8. [PMID: 7557452 DOI: 10.1016/0378-1119(95)00277-d] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
We inserted the sequence coding for the cytoplasmic portion of the human MET receptor into an 18-kb genomic fragment containing the entire human A1AT gene (encoding alpha-1-antitrypsin). Stringent control of gene expression, at the transcriptional, post-transcriptional and translational levels, was ensured by insertion of the MET open reading frame into A1AT, thus maintaining: (i) all the elements that confer tissue-specific transcription initiation, (ii) all the sequences involved in transcript processing and (iii) all the sequences which influence messenger stability and translational efficiency. The expression pattern of this vector in transgenic mice was identical to that of the human A1AT transgene, as well as to that of A1AT in humans with regard to both temporal and tissue-specific regulation.
Collapse
Affiliation(s)
- L Amicone
- Dipartimento di Biopatologia Umana, Università La Sapienza, Roma, Italy
| | | | | | | | | | | |
Collapse
|
8
|
van de Klundert FA, van Eldik GJ, Pieper FR, Jansen HJ, Bloemendal H. Identification of two silencers flanking an AP-1 enhancer in the vimentin promoter. Gene 1992; 122:337-43. [PMID: 1487148 DOI: 10.1016/0378-1119(92)90223-c] [Citation(s) in RCA: 27] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
We have studied the 5' upstream sequences required for the transcriptional regulation of the hamster gene encoding the intermediate filament protein, vimentin. Although vimentin is regarded as the intermediate filament protein of mesothelial tissue, it is also produced in most cultured cells. The human mammary carcinoma cell line, MCF-7, belongs to the exceptions. It contains no vimentin, and the complete upstream promoter region is inactive in this particular cell line. By using transient transfection of chimeric constructs into MCF-7 and HeLa cells, and subsequent chloramphenicol acetyltransferase assays, we were able to show the presence of two negative control regions flanking a double AP-1 enhancer element. Our data indicate that these elements exert their effect irrespective of orientation and position, suggesting that they are silencers. In vitro footprinting assays, gel mobility assays and Southwestern (protein-DNA) blotting revealed the presence of trans-acting factors interacting with both silencer elements. The silencing effect was particularly pronounced in MCF-7 cells, although DNA-binding proteins are present in HeLa cells as well.
Collapse
|
9
|
Abstract
Gene transfer can be achieved in the adult rat heart in vivo by direct injection of plasmid DNA. In this report we define the spatial and temporal limits of reporter gene expression after a single intracardiac injection. pRSVCAT (100 micrograms), in which the Rous sarcoma virus long terminal repeat is fused to the chloramphenicol acetyltransferase reporter gene, and p alpha MHCluc (100 micrograms), in which the alpha-cardiac myosin heavy chain promoter is fused to the firefly luciferase gene, were injected into hearts, and reporter gene activities were assayed at various times. Both chloramphenicol acetyltransferase and luciferase were detectable in 100% of the rats from 1 to 7 days, in 60% of the rats from 17 to 23 days, and in 30% of the rats from 38 to 60 days after injection. Reporter gene activity was largely limited to a 1-2-mm region of the ventricle surrounding the injection site. Closed circular DNA was far more effective than linear DNA in transfecting cells in vivo. The relative strengths of three different promoters, Rous sarcoma virus long terminal repeat, alpha-myosin heavy chain, and alpha 1-antitrypsin, all fused to the luciferase reporter gene were determined. The constitutive viral promoter was approximately 20-fold more active than the cardiac-specific cellular promoter, and the liver-specific cellular promoter was not active at all in the cardiac environment. Thus, direct injection of genes into the heart offers a simple and powerful tool with which to assess the behavior of genes in vivo.(ABSTRACT TRUNCATED AT 250 WORDS)
Collapse
Affiliation(s)
- P M Buttrick
- Department of Medicine, Albert Einstein College of Medicine, Bronx, N.Y
| | | | | | | | | |
Collapse
|
10
|
Functional analysis of the trans-acting factor binding sites of the mouse alpha-fetoprotein proximal promoter by site-directed mutagenesis. J Biol Chem 1991. [DOI: 10.1016/s0021-9258(18)54837-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
11
|
|
12
|
POST DAVIDJ, CARTER KENNETHC, PAPACONSTANTJNOU JOHN. The Effect of Aging on Constitutive mRNA Levels and Lipopolysaccharide Inducibility of Acute Phase Genes. Ann N Y Acad Sci 1991. [DOI: 10.1111/j.1749-6632.1991.tb16969.x] [Citation(s) in RCA: 29] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|