1
|
Zhao T, Hu Y, Zang T. DRACP: a novel method for identification of anticancer peptides. BMC Bioinformatics 2020; 21:559. [PMID: 33323099 PMCID: PMC7739480 DOI: 10.1186/s12859-020-03812-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 10/13/2020] [Indexed: 12/25/2022] Open
Abstract
Background Millions of people are suffering from cancers, but accurate early diagnosis and effective treatment are still tough for all doctors. Common ways against cancer include surgical operation, radiotherapy and chemotherapy. However, they are all very harmful for patients. Recently, the anticancer peptides (ACPs) have been discovered to be a potential way to treat cancer. Since ACPs are natural biologics, they are safer than other methods. However, the experimental technology is an expensive way to find ACPs so we purpose a new machine learning method to identify the ACPs. Results Firstly, we extracted the feature of ACPs in two aspects: sequence and chemical characteristics of amino acids. For sequence, average 20 amino acids composition was extracted. For chemical characteristics, we classified amino acids into six groups based on the patterns of hydrophobic and hydrophilic residues. Then, deep belief network has been used to encode the features of ACPs. Finally, we purposed Random Relevance Vector Machines to identify the true ACPs. We call this method ‘DRACP’ and tested the performance of it on two independent datasets. Its AUC and AUPR are higher than 0.9 in both datasets. Conclusion We developed a novel method named ‘DRACP’ and compared it with some traditional methods. The cross-validation results showed its effectiveness in identifying ACPs.
Collapse
Affiliation(s)
- Tianyi Zhao
- Department of Computer Science and Technology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Hu
- Department of Computer Science and Technology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- Department of Computer Science and Technology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
2
|
Zhang T, Wang R, Jiang Q, Wang Y. An Information Gain-based Method for Evaluating the Classification Power of Features Towards Identifying Enhancers. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191120141032] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Enhancers are cis-regulatory elements that enhance gene expression on
DNA sequences. Since most of enhancers are located far from transcription start sites, it is difficult
to identify them. As other regulatory elements, the regions around enhancers contain a variety of
features, which can help in enhancer recognition.
Objective:
The classification power of features differs significantly, the performances of existing
methods that use one or a few features for identifying enhancer vary greatly. Therefore, evaluating
the classification power of each feature can improve the predictive performance of enhancers.
Methods:
We present an evaluation method based on Information Gain (IG) that captures the
entropy change of enhancer recognition according to features. To validate the performance of our
method, experiments using the Single Feature Prediction Accuracy (SFPA) were conducted on
each feature.
Results:
The average IG values of the sequence feature, transcriptional feature and epigenetic
feature are 0.068, 0.213, and 0.299, respectively. Through SFPA, the average AUC values of the
sequence feature, transcriptional feature and epigenetic feature are 0.534, 0.605, and 0.647,
respectively. The verification results are consistent with our evaluation results.
Conclusion:
This IG-based method can effectively evaluate the classification power of features for
identifying enhancers. Compared with sequence features, epigenetic features are more effective for
recognizing enhancers.
Collapse
Affiliation(s)
- Tianjiao Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Rongjie Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
3
|
Zhuang H, Zhang Y, Yang S, Cheng L, Liu SL. A Mendelian Randomization Study on Infant Length and Type 2 Diabetes Mellitus Risk. Curr Gene Ther 2020; 19:224-231. [PMID: 31553296 DOI: 10.2174/1566523219666190925115535] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/15/2019] [Accepted: 06/16/2019] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Infant length (IL) is a positively associated phenotype of type 2 diabetes mellitus (T2DM), but the causal relationship of which is still unclear. Here, we applied a Mendelian randomization (MR) study to explore the causal relationship between IL and T2DM, which has the potential to provide guidance for assessing T2DM activity and T2DM- prevention in young at-risk populations. MATERIALS AND METHODS To classify the study, a two-sample MR, using genetic instrumental variables (IVs) to explore the causal effect was applied to test the influence of IL on the risk of T2DM. In this study, MR was carried out on GWAS data using 8 independent IL SNPs as IVs. The pooled odds ratio (OR) of these SNPs was calculated by the inverse-variance weighted method for the assessment of the risk the shorter IL brings to T2DM. Sensitivity validation was conducted to identify the effect of individual SNPs. MR-Egger regression was used to detect pleiotropic bias of IVs. RESULTS The pooled odds ratio from the IVW method was 1.03 (95% CI 0.89-1.18, P = 0.0785), low intercept was -0.477, P = 0.252, and small fluctuation of ORs ranged from -0.062 ((0.966 - 1.03) / 1.03) to 0.05 ((1.081 - 1.03) / 1.03) in leave-one-out validation. CONCLUSION We validated that the shorter IL causes no additional risk to T2DM. The sensitivity analysis and the MR-Egger regression analysis also provided adequate evidence that the above result was not due to any heterogeneity or pleiotropic effect of IVs.
Collapse
Affiliation(s)
- He Zhuang
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, 150001, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shu-Lin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China.,Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, Canada.,Department of Infectious Diseases, The First Affiliated Hospital, Harbin Medical University, Harbin, China.,Translational Medicine Research and Cooperation Center of Northern China, Heilongjiang Academy of Medical Sciences, Harbin, China
| |
Collapse
|
4
|
Wang J, Su X, Zhao L, Zhang J. Deep Reinforcement Learning for Data Association in Cell Tracking. Front Bioeng Biotechnol 2020; 8:298. [PMID: 32328484 PMCID: PMC7161216 DOI: 10.3389/fbioe.2020.00298] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 03/20/2020] [Indexed: 01/27/2023] Open
Abstract
Accurate target detection and association are vital for the development of reliable target tracking, especially for cell tracking based on microscopy images due to the similarity of cells. We propose a deep reinforcement learning method to associate the detected targets between frames. According to the dynamic model of each target, the cost matrix is produced by conjointly considering various features of targets and then used as the input of a neural network. The proposed neural network is trained using reinforcement learning to predict a distribution over the association solution. Furthermore, we design a residual convolutional neural network that results in more efficient learning. We validate our method on two applications: the multiple target tracking simulation and the ISBI cell tracking. The results demonstrate that our approach based on reinforcement learning techniques could effectively track targets following different motion patterns and show competitive results.
Collapse
Affiliation(s)
- Junjie Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiaohong Su
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Lingling Zhao
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jun Zhang
- Department of Rehabilitation, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| |
Collapse
|
5
|
Abstract
BACKGROUND With the development of e-Health, it plays a more and more important role in predicting whether a doctor's answer can be accepted by a patient through online healthcare community. Unlike the previous work which focus mainly on the numerical feature, in our framework, we combine both numerical and textual information to predict the acceptance of answers. The textual information is composed of questions posted by the patients and answers posted by the doctors. To extract the textual features from them, we first trained a sentence encoder to encode a pair of question and answer into a co-dependent representation on a held-out dataset. After that,we can use it to predict the acceptance of answers by doctors. RESULTS Our experimental results on the real-world dataset demonstrate that by applying our model additional features from text can be extracted and the prediction can be more accurate. That's to say, the model which take both textual features and numerical features as input performs significantly better than model which takes numerical features only on all the four metrics (Accuracy, AUC, F1-score and Recall). CONCLUSIONS This work proposes a generic framework combining numerical features and textual features for acceptance prediction, where textual features are extracted from text based on deep learning methods firstly and can be used to achieve a better prediction results.
Collapse
Affiliation(s)
- Qianlong Liu
- School of Data Science, Fudan University, Handan Road, Shanghai, China
- Jockey Club School of Public Health and Primary Care The Chinese University of Hong Kong, Hong Kong, China
| | - Kangenbei Liao
- School of Data Science, Fudan University, Handan Road, Shanghai, China
- Jockey Club School of Public Health and Primary Care The Chinese University of Hong Kong, Hong Kong, China
| | - Kelvin Kam-fai Tsoi
- Jockey Club School of Public Health and Primary Care The Chinese University of Hong Kong, Hong Kong, China
| | - Zhongyu Wei
- School of Data Science, Fudan University, Handan Road, Shanghai, China
| |
Collapse
|
6
|
Peng J, Lu G, Xue H, Wang T, Shang X. TS-GOEA: a web tool for tissue-specific gene set enrichment analysis based on gene ontology. BMC Bioinformatics 2019; 20:572. [PMID: 31760951 PMCID: PMC6876092 DOI: 10.1186/s12859-019-3125-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND The Gene Ontology (GO) knowledgebase is the world's largest source of information on the functions of genes. Since the beginning of GO project, various tools have been developed to perform GO enrichment analysis experiments. GO enrichment analysis has become a commonly used method of gene function analysis. Existing GO enrichment analysis tools do not consider tissue-specific information, although this information is very important to current research. RESULTS In this paper, we built an easy-to-use web tool called TS-GOEA that allows users to easily perform experiments based on tissue-specific GO enrichment analysis. TS-GOEA uses strict threshold statistical method for GO enrichment analysis, and provides statistical tests to improve the reliability of the analysis results. Meanwhile, TS-GOEA provides tools to compare different experimental results, which is convenient for users to compare the experimental results. To evaluate its performance, we tested the genes associated with platelet disease with TS-GOEA. CONCLUSIONS TS-GOEA is an effective GO analysis tool with unique features. The experimental results show that our method has better performance and provides a useful supplement for the existing GO enrichment analysis tools. TS-GOEA is available at http://120.77.47.2:5678.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Guilin Lu
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Hansheng Xue
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Tao Wang
- School of Computer Science, Harbin Institute of Technology, Harbin, 150001 China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| |
Collapse
|
7
|
Zhu X, Fu B, Yang Y, Ma Y, Hao J, Chen S, Liu S, Li T, Liu S, Guo W, Liao Z. Attention-based recurrent neural network for influenza epidemic prediction. BMC Bioinformatics 2019; 20:575. [PMID: 31760945 PMCID: PMC6876090 DOI: 10.1186/s12859-019-3131-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Influenza is an infectious respiratory disease that can cause serious public health hazard. Due to its huge threat to the society, precise real-time forecasting of influenza outbreaks is of great value to our public. RESULTS In this paper, we propose a new deep neural network structure that forecasts a real-time influenza-like illness rate (ILI%) in Guangzhou, China. Long short-term memory (LSTM) neural networks is applied to precisely forecast accurateness due to the long-term attribute and diversity of influenza epidemic data. We devise a multi-channel LSTM neural network that can draw multiple information from different types of inputs. We also add attention mechanism to improve forecasting accuracy. By using this structure, we are able to deal with relationships between multiple inputs more appropriately. Our model fully consider the information in the data set, targetedly solving practical problems of the Guangzhou influenza epidemic forecasting. CONCLUSION We assess the performance of our model by comparing it with different neural network structures and other state-of-the-art methods. The experimental results indicate that our model has strong competitiveness and can provide effective real-time influenza epidemic forecasting.
Collapse
Affiliation(s)
- Xianglei Zhu
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Bofeng Fu
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Yaodong Yang
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Yu Ma
- Guangzhou Center for Disease Control and Prevention, Guangzhou, 510440 China
| | - Jianye Hao
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Siqi Chen
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Shuang Liu
- College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Tiegang Li
- Guangzhou Center for Disease Control and Prevention, Guangzhou, 510440 China
| | - Sen Liu
- Automotive Data Center, China Automotive Technology & Research, Tianjin, 300300 China
| | - Weiming Guo
- Automotive Data Center, China Automotive Technology & Research, Tianjin, 300300 China
| | - Zhenyu Liao
- Pony Testing International Group, Tianjin, 300051 China
- Tianjin FoodSafety Inspection Technology Institute, Tianjin, 300300 China
| |
Collapse
|
8
|
Wang Y, Juan L, Peng J, Zang T, Wang Y. Prioritizing candidate diseases-related metabolites based on literature and functional similarity. BMC Bioinformatics 2019; 20:574. [PMID: 31760947 PMCID: PMC6876110 DOI: 10.1186/s12859-019-3127-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background As the terminal products of cellular regulatory process, functional related metabolites have a close relationship with complex diseases, and are often associated with the same or similar diseases. Therefore, identification of disease related metabolites play a critical role in understanding comprehensively pathogenesis of disease, aiming at improving the clinical medicine. Considering that a large number of metabolic markers of diseases need to be explored, we propose a computational model to identify potential disease-related metabolites based on functional relationships and scores of referred literatures between metabolites. First, obtaining associations between metabolites and diseases from the Human Metabolome database, we calculate the similarities of metabolites based on modified recommendation strategy of collaborative filtering utilizing the similarities between diseases. Next, a disease-associated metabolite network (DMN) is built with similarities between metabolites as weight. To improve the ability of identifying disease-related metabolites, we introduce scores of text mining from the existing database of chemicals and proteins into DMN and build a new disease-associated metabolite network (FLDMN) by fusing functional associations and scores of literatures. Finally, we utilize random walking with restart (RWR) in this network to predict candidate metabolites related to diseases. Results We construct the disease-associated metabolite network and its improved network (FLDMN) with 245 diseases, 587 metabolites and 28,715 disease-metabolite associations. Subsequently, we extract training sets and testing sets from two different versions of the Human Metabolome database and assess the performance of DMN and FLDMN on 19 diseases, respectively. As a result, the average AUC (area under the receiver operating characteristic curve) of DMN is 64.35%. As a further improved network, FLDMN is proven to be successful in predicting potential metabolic signatures for 19 diseases with an average AUC value of 76.03%. Conclusion In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. This result suggests that integrating literature and functional associations can be an effective way to construct disease associated metabolite network for prioritizing candidate diseases-related metabolites.
Collapse
Affiliation(s)
- Yongtian Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, People's Republic of China
| | - Tianyi Zang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| |
Collapse
|
9
|
Zhao T, Wang D, Hu Y, Zhang N, Zang T, Wang Y. Identifying Alzheimer’s Disease-related miRNA Based on Semi-clustering. Curr Gene Ther 2019; 19:216-223. [DOI: 10.2174/1566523219666190924113737] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/05/2019] [Accepted: 06/12/2019] [Indexed: 01/14/2023]
Abstract
Background:
More and more scholars are trying to use it as a specific biomarker for Alzheimer’s
Disease (AD) and mild cognitive impairment (MCI). Multiple studies have indicated that
miRNAs are associated with poor axonal growth and loss of synaptic structures, both of which are early
events in AD. The overall loss of miRNA may be associated with aging, increasing the incidence of
AD, and may also be involved in the disease through some specific molecular mechanisms.
Objective:
Identifying Alzheimer’s disease-related miRNA can help us find new drug targets, early
diagnosis.
Materials and Methods:
We used genes as a bridge to connect AD and miRNAs. Firstly, proteinprotein
interaction network is used to find more AD-related genes by known AD-related genes. Then,
each miRNA’s correlation with these genes is obtained by miRNA-gene interaction. Finally, each
miRNA could get a feature vector representing its correlation with AD. Unlike other studies, we do not
generate negative samples randomly with using classification method to identify AD-related miRNAs.
Here we use a semi-clustering method ‘one-class SVM’. AD-related miRNAs are considered as outliers
and our aim is to identify the miRNAs that are similar to known AD-related miRNAs (outliers).
Results and Conclusion:
We identified 257 novel AD-related miRNAs and compare our method with
SVM which is applied by generating negative samples. The AUC of our method is much higher than
SVM and we did case studies to prove that our results are reliable.
Collapse
Affiliation(s)
- Tianyi Zhao
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Donghua Wang
- Department of General Surgery, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yang Hu
- School of life Science and Tenchnology, Harbin Institute of Technology, Harbin, China
| | - Ningyi Zhang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
10
|
Chen X, Shi W, Deng L. Prediction of Disease Comorbidity Using HeteSim Scores based on Multiple Heterogeneous Networks. Curr Gene Ther 2019; 19:232-241. [DOI: 10.2174/1566523219666190917155959] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/14/2019] [Accepted: 06/16/2019] [Indexed: 12/25/2022]
Abstract
Background:
Accumulating experimental studies have indicated that disease comorbidity
causes additional pain to patients and leads to the failure of standard treatments compared to patients
who have a single disease. Therefore, accurate prediction of potential comorbidity is essential to design
more efficient treatment strategies. However, only a few disease comorbidities have been discovered
in the clinic.
Objective:
In this work, we propose PCHS, an effective computational method for predicting disease
comorbidity.
Materials and Methods:
We utilized the HeteSim measure to calculate the relatedness score for different
disease pairs in the global heterogeneous network, which integrates six networks based on biological
information, including disease-disease associations, drug-drug interactions, protein-protein interactions
and associations among them. We built the prediction model using the Support Vector Machine
(SVM) based on the HeteSim scores.
Results and Conclusion:
The results showed that PCHS performed significantly better than previous
state-of-the-art approaches and achieved an AUC score of 0.90 in 10-fold cross-validation. Furthermore,
some of our predictions have been verified in literatures, indicating the effectiveness of our method.
Collapse
Affiliation(s)
- Xuegong Chen
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| | - Wanwan Shi
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| |
Collapse
|