1
|
Barreiro E, Munteanu CR, Cruz-Monteagudo M, Pazos A, González-Díaz H. Net-Net Auto Machine Learning (AutoML) Prediction of Complex Ecosystems. Sci Rep 2018; 8:12340. [PMID: 30120369 PMCID: PMC6098100 DOI: 10.1038/s41598-018-30637-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 07/24/2018] [Indexed: 11/09/2022] Open
Abstract
Biological Ecosystem Networks (BENs) are webs of biological species (nodes) establishing trophic relationships (links). Experimental confirmation of all possible links is difficult and generates a huge volume of information. Consequently, computational prediction becomes an important goal. Artificial Neural Networks (ANNs) are Machine Learning (ML) algorithms that may be used to predict BENs, using as input Shannon entropy information measures (Shk) of known ecosystems to train them. However, it is difficult to select a priori which ANN topology will have a higher accuracy. Interestingly, Auto Machine Learning (AutoML) methods focus on the automatic selection of the more efficient ML algorithms for specific problems. In this work, a preliminary study of a new approach to AutoML selection of ANNs is proposed for the prediction of BENs. We call it the Net-Net AutoML approach, because it uses for the first time Shk values of both networks involving BENs (networks to be predicted) and ANN topologies (networks to be tested). Twelve types of classifiers have been tested for the Net-Net model including linear, Bayesian, trees-based methods, multilayer perceptrons and deep neuronal networks. The best Net-Net AutoML model for 338,050 outputs of 10 ANN topologies for links of 69 BENs was obtained with a deep fully connected neuronal network, characterized by a test accuracy of 0.866 and a test AUROC of 0.935. This work paves the way for the application of Net-Net AutoML to other systems or ML algorithms.
Collapse
Affiliation(s)
- Enrique Barreiro
- Department of Computation, Computer Science Faculty, University of A Coruna (UDC), 15071, A Coruña, Spain.,Center for Computational Science (CCS), University of Miami (UM), Miami, 33136, FL, USA.,West Coast University, Miami Campus, 33178, FL, USA
| | - Cristian R Munteanu
- Department of Computation, Computer Science Faculty, University of A Coruna (UDC), 15071, A Coruña, Spain
| | - Maykel Cruz-Monteagudo
- Center for Computational Science (CCS), University of Miami (UM), Miami, 33136, FL, USA.,West Coast University, Miami Campus, 33178, FL, USA
| | - Alejandro Pazos
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), A Coruña, 15006, Spain
| | - Humbert González-Díaz
- Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940, Biscay, Spain. .,IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Biscay, Spain.
| |
Collapse
|
2
|
Graham DJ. A new bioinformatics approach to natural protein collections: permutation structure contrasts of viral and cellular systems. Protein J 2013; 32:275-87. [PMID: 23605224 DOI: 10.1007/s10930-013-9485-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Biological cells and viruses operate by different replication and symmetry paradigms. Cells are able to replicate independently and express little spatial symmetry; viruses require cells for replication while manifesting high symmetry. The author inquired whether different paradigms were reflected in the permutations of amino acid sequences. The hypothesis was that the permutation structure level and symmetry within viral protein collections exceed that of living cells. The rationale was that one symmetry aspect generally accompanies and promotes others in a system. The inquiry was readily answered given abundant sequence archives for proteins. The analysis of collections from diverse viral and cellular sources lends strong support. Additional insights into protein primary structure, the design of collections, and the role of information are provided as well.
Collapse
Affiliation(s)
- Daniel J Graham
- Department of Chemistry, Loyola University Chicago, 6525 North Sheridan Road, Chicago, IL 60626, USA.
| |
Collapse
|
3
|
Information properties of naturally-occurring proteins: Fourier analysis and complexity phase plots. Protein J 2012; 31:550-63. [PMID: 22814572 DOI: 10.1007/s10930-012-9432-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
In previous work from this lab, the information in natural proteins was investigated with Ribonuclease A (RNase A) serving as the source. The signature traits were investigated at three structure levels: primary through tertiary. The present paper travels further by charting the primary structure information of about half a million molecules. This was feasible given abundant sequence archives for both living and viral systems. Notably, a method is presented for evaluating primary structure information, based on Fourier analysis and spectral complexity. Significantly, the results show certain complexity traits to be universal for living sources. Viruses, by contrast, encode protein collections which are case-specific and complexity-divergent. The results have ramifications for discriminating collections on the basis of sequence information. This discrimination offers new strategies for selecting drug targets.
Collapse
|