1
|
Piovesan A, Pelleri MC, Antonaros F, Strippoli P, Caracausi M, Vitale L. On the length, weight and GC content of the human genome. BMC Res Notes 2019; 12:106. [PMID: 30813969 PMCID: PMC6391780 DOI: 10.1186/s13104-019-4137-z] [Citation(s) in RCA: 139] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Accepted: 02/15/2019] [Indexed: 01/08/2023] Open
Abstract
Objective Basic parameters commonly used to describe genomes including length, weight and relative guanine-cytosine (GC) content are widely cited in absence of a primary source. By using updated data and original software we determined these values to the best of our knowledge as standard reference for the whole human nuclear genome, for each chromosome and for mitochondrial DNA. We also devised a method to calculate the relative GC content in the whole messenger RNA sequence set and in transcriptomes by multiplying the GC content of each gene by its mean expression level. Results The male nuclear diploid genome extends for 6.27 Gigabase pairs (Gbp), is 205.00 cm (cm) long and weighs 6.41 picograms (pg). Female values are 6.37 Gbp, 208.23 cm, 6.51 pg. The individual variability and the implication for the DNA informational density in terms of bits/volume were discussed. The genomic GC content is 40.9%. Following analysis in different transcriptomes and species, we showed that the greatest deviation was observed in the pathological condition analysed (trisomy 21 leukaemic cells) and in Caenorhabditis elegans. Our results may represent a solid basis for further investigation on human structural and functional genomics while also providing a framework for other genome comparative analysis. Electronic supplementary material The online version of this article (10.1186/s13104-019-4137-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Allison Piovesan
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy
| | - Maria Chiara Pelleri
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy
| | - Francesca Antonaros
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy
| | - Pierluigi Strippoli
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy
| | - Maria Caracausi
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy.
| | - Lorenza Vitale
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Via Belmeloro 8, 40126, Bologna, BO, Italy
| |
Collapse
|
2
|
Vitale L, Caracausi M, Casadei R, Pelleri MC, Piovesan A. Difficulty in obtaining the complete mRNA coding sequence at 5' region (5' end mRNA artifact): Causes, consequences in biology and medicine and possible solutions for obtaining the actual amino acid sequence of proteins (Review). Int J Mol Med 2017; 39:1063-1071. [PMID: 28393177 DOI: 10.3892/ijmm.2017.2942] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 03/16/2017] [Indexed: 11/06/2022] Open
Abstract
The known difficulty in obtaining the actual full length, complete sequence of a messenger RNA (mRNA) may lead to the erroneous determination of its coding sequence at the 5' region (5' end mRNA artifact), and consequently to the wrong assignment of the translation start codon, leading to the inaccurate prediction of the encoded polypeptide at its amino terminus. Among the known human genes whose study was affected by this artifact, we can include disco interacting protein 2 homolog A (DIP2A; KIAA0184), Down syndrome critical region 1 (DSCR1), SON DNA binding protein (SON), trefoil factor 3 (TFF3) and URB1 ribosome biogenesis 1 homolog (URB1; KIAA0539) on chromosome 21, as well as receptor for activated C kinase 1 (RACK1, also known as GNB2L1), glutaminyl‑tRNA synthetase (QARS) and tyrosyl-DNA phosphodiesterase 2 (TDP2) along with another 474 loci, including interleukin 16 (IL16). In this review, we discuss the causes of this issue, its quantitative incidence in biomedical research, the consequences in biology and medicine, and the possible solutions for obtaining the actual amino acid sequence of proteins in the post-genomics era.
Collapse
Affiliation(s)
- Lorenza Vitale
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, I‑40126 Bologna, Italy
| | - Maria Caracausi
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, I‑40126 Bologna, Italy
| | - Raffaella Casadei
- Department for Life Quality Studies, University of Bologna, I‑47921 Rimini, Italy
| | - Maria Chiara Pelleri
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, I‑40126 Bologna, Italy
| | - Allison Piovesan
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, I‑40126 Bologna, Italy
| |
Collapse
|
3
|
Piovesan A, Caracausi M, Ricci M, Strippoli P, Vitale L, Pelleri MC. Identification of minimal eukaryotic introns through GeneBase, a user-friendly tool for parsing the NCBI Gene databank. DNA Res 2015; 22:495-503. [PMID: 26581719 PMCID: PMC4675715 DOI: 10.1093/dnares/dsv028] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 10/07/2015] [Indexed: 01/26/2023] Open
Abstract
We have developed GeneBase, a full parser of the National Center for Biotechnology Information (NCBI) Gene database, which generates a fully structured local database with an intuitive user-friendly graphic interface for personal computers. Features of all the annotated eukaryotic genes are accessible through three main software tables, including for each entry details such as the gene summary, the gene exon/intron structure and the specific Gene Ontology attributions. The structuring of the data, the creation of additional calculation fields and the integration with nucleotide sequences allow users to make many types of comparisons and calculations that are useful for data retrieval and analysis. We provide an original example analysis of the existing introns across all the available species, through which the classic biological problem of the ‘minimal intron’ may find a solution using available data. Based on all currently available data, we can define the shortest known eukaryotic GT-AG intron length, setting the physical limit at the 30 base pair intron belonging to the human MST1L gene. This ‘model intron’ will shed light on the minimal requirement elements of recognition used for conventional splicing functioning. Remarkably, this size is indeed consistent with the sum of the splicing consensus sequence lengths.
Collapse
Affiliation(s)
- Allison Piovesan
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Bologna, BO 40126, Italy
| | - Maria Caracausi
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Bologna, BO 40126, Italy
| | - Marco Ricci
- Department of Biological, Geological and Environmental Sciences (BIGeA), University of Bologna, Bologna, BO 40126, Italy
| | - Pierluigi Strippoli
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Bologna, BO 40126, Italy
| | - Lorenza Vitale
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Bologna, BO 40126, Italy
| | - Maria Chiara Pelleri
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), Unit of Histology, Embryology and Applied Biology, University of Bologna, Bologna, BO 40126, Italy
| |
Collapse
|
4
|
Zhang J, Lou X, Shen H, Zellmer L, Sun Y, Liu S, Xu N, Liao DJ. Isoforms of wild type proteins often appear as low molecular weight bands on SDS-PAGE. Biotechnol J 2014; 9:1044-54. [PMID: 24906056 DOI: 10.1002/biot.201400072] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2014] [Revised: 04/03/2014] [Accepted: 06/04/2014] [Indexed: 11/08/2022]
Abstract
Immunoblotting, after polyacrylamide gel electrophoresis with sodium dodecyl sulfate (SDS-PAGE), is a technique commonly used to detect specific proteins. SDS-PAGE often results in the visualization of protein band(s) in addition to the one expected based on the theoretical molecular mass (TMM) of the protein of interest. To determine the likelihood of additional band(s) being nonspecific, we used liquid chromatography - mass spectrometry to identify proteins that were extracted from bands with the apparent molecular mass (MM) of 40 and 26 kD, originating from protein extracts derived from non-malignant HEK293 and cancerous MDA-MB231 (MB231) cells separated using SDS-PAGE. In total, approximately 57% and 21% of the MS/MS spectra were annotated as peptides in the two cell samples, respectively. Moreover, approximately 24% and 36.2% of the identified proteins from HEK293 and MB231 cells matched their TMMs. Of the identified proteins, 8% from HEK293 and 26% from MB231 had apparent MMs that were larger than predicted, and 67% from HEK293 and 37% from MB231 exhibited smaller MM values than predicted. These revelations suggest that interpretation of the positive bands of immunoblots should be conducted with caution. This study also shows that protein identification performed by mass spectrometry on bands excised from SDS-PAGE gels could make valuable contributions to the identification of cancer biomarkers, and to cancer-therapy studies.
Collapse
Affiliation(s)
- Ju Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, P. R. China
| | | | | | | | | | | | | | | |
Collapse
|