1
|
CoCoNet: an efficient deep learning tool for viral metagenome binning. Bioinformatics 2021; 37:2803-2810. [PMID: 33822891 DOI: 10.1093/bioinformatics/btab213] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 03/24/2021] [Accepted: 04/02/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Metagenomic approaches hold the potential to characterize microbial communities and unravel the intricate link between the microbiome and biological processes. Assembly is one of the most critical steps in metagenomics experiments. It consists of transforming overlapping DNA sequencing reads into sufficiently accurate representations of the community's genomes. This process is computationally difficult and commonly results in genomes fragmented across many contigs. Computational binning methods are used to mitigate fragmentation by partitioning contigs based on their sequence composition, abundance or chromosome organization into bins representing the community's genomes. Existing binning methods have been principally tuned for bacterial genomes and do not perform favorably on viral metagenomes. RESULTS We propose Composition and Coverage Network (CoCoNet), a new binning method for viral metagenomes that leverages the flexibility and the effectiveness of deep learning to model the co-occurrence of contigs belonging to the same viral genome and provide a rigorous framework for binning viral contigs. Our results show that CoCoNet substantially outperforms existing binning methods on viral datasets. AVAILABILITY AND IMPLEMENTATION CoCoNet was implemented in Python and is available for download on PyPi (https://pypi.org/). The source code is hosted on GitHub at https://github.com/Puumanamana/CoCoNet and the documentation is available at https://coconet.readthedocs.io/en/latest/index.html. CoCoNet does not require extensive resources to run. For example, binning 100k contigs took about 4 h on 10 Intel CPU Cores (2.4 GHz), with a memory peak at 27 GB (see Supplementary Fig. S9). To process a large dataset, CoCoNet may need to be run on a high RAM capacity server. Such servers are typically available in high-performance or cloud computing settings. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
2
|
Generalized Framework for Control of Redundant Manipulators in Robot-Assisted Minimally Invasive Surgery. Ing Rech Biomed 2018. [DOI: 10.1016/j.irbm.2018.04.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
3
|
Abstract
This paper deals with the survey of kinematic structures adapted to specific medical robots: minimally invasive surgery (MIS) and tele-echography. The large diversity of kinematic architectures that can be found in medical robotics leads us to perform a statistical analysis to inform and guide design of medical robots. Safety constraints and some considerations in design evolution of medical robots are presented in this paper. First, we describe the spectrum of medical robots in minimally invasive surgery and tele-echography applications and particularly the variety of kinematic architectures used. We present the robots and their kinematic architectures and highlight differences that occur in each medical application. We perform a statistical analysis which can serve as a resource in topological synthesis for each specific medical application. Safety is an important specification in medical robotics, and for that reason we show the means used to take into account this constraint. This study demonstrates that the nature of medical robots implies specific requirements leading to different kinematic structures. The statistical analysis gives information on choice of kinematic structures for medical applications (minimally invasive surgery and echography). The safety constraint as well as the interaction between doctor and robot leads to investigate new mechanical solutions to enhance medical robot safety and compliance. We expect that this paper will serve as a significant resource and help the design of future medical robots.
Collapse
|
4
|
|
5
|
Method of dimensional optimization of spherical robots for medical applications using specialized indices. Adv Robot 2013. [DOI: 10.1080/01691864.2013.861368] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
6
|
Abstract
BACKGROUND Classification is the problem of assigning each input object to one of a finite number of classes. This problem has been extensively studied in machine learning and statistics, and there are numerous applications to bioinformatics as well as many other fields. Building a multiclass classifier has been a challenge, where the direct approach of altering the binary classification algorithm to accommodate more than two classes can be computationally too expensive. Hence the indirect approach of using binary decomposition has been commonly used, in which retrieving the class posterior probabilities from the set of binary posterior probabilities given by the individual binary classifiers has been a major issue. METHODS In this work, we present an extension of a recently introduced probabilistic kernel-based learning algorithm called the Classification Relevance Units Machine (CRUM) to the multiclass setting to increase its applicability. The extension is achieved under the error correcting output codes framework. The probabilistic outputs of the binary CRUMs are preserved using a proposed linear-time decoding algorithm, an alternative to the generalized Bradley-Terry (GBT) algorithm whose application to large-scale prediction settings is prohibited by its computational complexity. The resulting classifier is called the Multiclass Relevance Units Machine (McRUM). RESULTS The evaluation of McRUM on a variety of real small-scale benchmark datasets shows that our proposed Naïve decoding algorithm is computationally more efficient than the GBT algorithm while maintaining a similar level of predictive accuracy. Then a set of experiments on a larger scale dataset for small ncRNA classification have been conducted with Naïve McRUM and compared with the Gaussian and linear SVM. Although McRUM's predictive performance is slightly lower than the Gaussian SVM, the results show that the similar level of true positive rate can be achieved by sacrificing false positive rate slightly. Furthermore, McRUM is computationally more efficient than the SVM, which is an important factor for large-scale analysis. CONCLUSIONS We have proposed McRUM, a multiclass extension of binary CRUM. McRUM with Naïve decoding algorithm is computationally efficient in run-time and its predictive performance is comparable to the well-known SVM, showing its potential in solving large-scale multiclass problems in bioinformatics and other fields of study.
Collapse
|
7
|
Probabilistic Prediction of Protein Phosphorylation Sites Using Classification Relevance Units Machines. APPLIED COMPUTING REVIEW 2012; 12:8-20. [PMID: 24163645 DOI: 10.1145/2432546.2432547] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Phosphorylation is an important post-translational modification of proteins that is essential to the regulation of many cellular processes. Although most of the phosphorylation sites discovered in protein sequences have been identified experimentally, the in vivo and in vitro discovery of the sites is an expensive, time-consuming and laborious task. Therefore, the development of computational methods for prediction of protein phosphorylation sites has drawn considerable attention. In this work, we present a kernel-based probabilistic Classification Relevance Units Machine (CRUM) for in silico phosphorylation site prediction. In comparison with the popular Support Vector Machine (SVM) CRUM shows comparable predictive performance and yet provides a more parsimonious model. This is desirable since it leads to a reduction in prediction run-time, which is important in predictions on large-scale data. Furthermore, the CRUM training algorithm has lower run-time and memory complexity and has a simpler parameter selection scheme than the Relevance Vector Machine (RVM) learning algorithm. To further investigate the viability of using CRUM in phosphorylation site prediction, we construct multiple CRUM predictors using different combinations of three phosphorylation site features - BLOSUM encoding, disorder, and amino acid composition. The predictors are evaluated through cross-validation and the results show that CRUM with BLOSUM feature is among the best performing CRUM predictors in both cross-validation and benchmark experiments. A comparative study with existing prediction tools in an independent benchmark experiment suggests possible direction for further improving the predictive performance of CRUM predictors.
Collapse
|
8
|
Abstract
BACKGROUND A large family of viruses that infect bacteria, called phages, is characterized by long tails used to inject DNA into their victims' cells. The tape measure protein got its name because the length of the corresponding gene is proportional to the length of the phage's tail: a fact shown by actually copying or splicing out parts of DNA in exemplar species. A natural question is whether there exist units for these tape measures, and if different tape measures have different units and lengths. Such units would allow us to retrace the evolution of tape measure proteins using their duplication/loss history. The vast number of sequenced phages genomes allows us to attack this problem with a comparative genomics approach. RESULTS Here we describe a subset of phages whose tape measure proteins contain variable numbers of an 11 amino acids sequence repeat, aligned with sequence similarity, structural properties, and simple arithmetics. This subset provides a unique opportunity for the combinatorial study of phage evolution, without the added uncertainties of multiple alignments, which are trivial in this case, or of protein functions, that are well established. We give a heuristic that reconstructs the duplication history of these sequences, using divergent strains to discriminate between mutations that occurred before and after speciation, or lineage divergence. The heuristic is based on an efficient algorithm that gives an exhaustive enumeration of all possible parsimonious reconstructions of the duplication/speciation history of a single nucleotide. Finally, we present a method that allows, when possible, to discriminate between duplication and loss events. CONCLUSIONS Establishing the evolutionary history of viruses is difficult, in part due to extensive recombinations and gene transfers, and high mutation rates that often erase detectable similarity between homologous genes. In this paper, we introduce new tools to address this problem.
Collapse
|
9
|
Miniemulsion polymerizations of n-butyl cyanoacrylate via two routes: towards a control of particle degradation. Colloids Surf B Biointerfaces 2011; 88:332-8. [PMID: 21802908 DOI: 10.1016/j.colsurfb.2011.07.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Revised: 06/15/2011] [Accepted: 07/05/2011] [Indexed: 11/17/2022]
Abstract
This study aimed at determining the influence of the mechanism of polymerization on the molar mass and degradation of poly(n-butyl cyanoacrylate) (PBCA) nanoparticles obtained by miniemulsion polymerization. Therefore, nanoparticles of poly(n-butyl cyanoacrylate) were synthesized via radical and/or anionic miniemulsion polymerization stabilized by Brij®78, a POE based surfactant. Polymerization conditions had little influence on the final diameter while it severely affected the final molar masses of PBCA. An increase of the temperature and of the pH of the continuous phase led to higher molar masses. A further increase was observed when a radical initiator was added in the monomer. The evolution of the molar mass of the synthesized poly(n-butyl cyanoacrylate) was followed as a function of time at pH 7.4 by Size Exclusion Chromatography. As expected, the degradation kinetics strongly depended on the polymerization mechanism (anionic or radical).
Collapse
|
10
|
Abstract
Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities.
Collapse
|
11
|
FragAnchor: a large-scale predictor of glycosylphosphatidylinositol anchors in eukaryote protein sequences by qualitative scoring. GENOMICS PROTEOMICS & BIOINFORMATICS 2007; 5:121-30. [PMID: 17893077 PMCID: PMC5054108 DOI: 10.1016/s1672-0229(07)60022-9] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A glycosylphosphatidylinositol (GPI) anchor is a common but complex C-terminal post-translational modification of extracellular proteins in eukaryotes. Here we investigate the problem of correctly annotating GPI-anchored proteins for the growing number of sequences in public databases. We developed a computational system, called FragAnchor, based on the tandem use of a neural network (NN) and a hidden Markov model (HMM). Firstly, NN selects potential GPI-anchored proteins in a dataset, then HMM parses these potential GPI signals and refines the prediction by qualitative scoring. FragAnchor correctly predicted 91% of all the GPI-anchored proteins annotated in the Swiss-Prot database. In a large-scale analysis of 29 eukaryote proteomes, FragAnchor predicted that the percentage of highly probable GPI-anchored proteins is between 0.21% and 2.01%. The distinctive feature of FragAnchor, compared with other systems, is that it targets only the C-terminus of a protein, making it less sensitive to the background noise found in databases and possible incomplete protein sequences. Moreover, FragAnchor can be used to predict GPI-anchored proteins in all eukaryotes. Finally, by using qualitative scoring, the predictions combine both sensitivity and information content. The predictor is publicly available at http://navet.ics.hawaii.edu/~fraganchor/NNHMM/NNHMM.html.
Collapse
|
12
|
Abstract
BACKGROUND In environmental sequencing projects, a mix of DNA from a whole microbial community is fragmented and sequenced, with one of the possible goals being to reconstruct partial or complete genomes of members of the community. In communities with high diversity of species, a significant proportion of the sequences do not overlap any other fragment in the sample. This problem will arise not only in situations with a relatively even distribution of many species, but also when the community in a particular environment is routinely dominated by the same few species. In the former case, no genomes may be assembled at all, while in the latter case a few dominant species in an environment will always be sequenced at high coverage to the detriment of coverage of the greater number of sparse species. METHODS AND RESULTS Here we show that, with the same global sequencing effort, separating the species into two or more sub-communities prior to sequencing can yield a much higher proportion of sequences that can be assembled. We first use the Lander-Waterman model to show that, if the expected percentage of singleton sequences is higher than 25%, then, under the uniform distribution hypothesis, splitting the community is always a wise choice. We then construct simulated microbial communities to show that the results hold for highly non-uniform distributions. We also show that, for the distributions considered in the experiments, it is possible to estimate quite accurately the relative diversity of the two sub-communities. CONCLUSION Given the fact that several methods exist to split microbial communities based on physical properties such as size, density, surface biochemistry, or optical properties, we strongly suggest that groups involved in environmental sequencing, and expecting high diversity, consider splitting their communities in order to maximize the information content of their sequencing effort.
Collapse
|
13
|
Abdominal and fetal echography tele-operated in several medical centres sites, from an expert center, using a robotic arm & telephone or satellite link. JOURNAL OF GRAVITATIONAL PHYSIOLOGY : A JOURNAL OF THE INTERNATIONAL SOCIETY FOR GRAVITATIONAL PHYSIOLOGY 2007; 14:P139-P140. [PMID: 18372738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
OBJECTIVE to design and validate a method for tele-operating (from an expert site) an echographic examination in an isolated site where the patient stays. METHOD A dedicated robotic arm (ESTELE) holding a real ultrasound probe is remotely controlled from the expert site with a fictive probe, and reproduces on the real probe all the movements of the expert hand. The isolated places, are areas with reduced medical facilities, (secondary hospitals 20 to 100 km from the main hospital in Europ, dispensaries in Africa, Amazonia, the a rescue vehicles.... RESULTS ESTELE was tested on 87 adults and 29 pregnant with ISDN or satellite lines. During fetal tele-operated echography the expert was able to perform appropriate views of the fetal structures in 95% of the cases. During exploration of adult abdomen the expert visualized the main organs in 87% of the cases. Presently the ESTELE system is installed in 4 secondary hospitals, 40 to 100 km from our University Hospital and tele-operated daily by our staff. CONCLUSION Robotized tele-echography provide similar information as direct examination. No false diagnostic was reported. Moreover the patients were examined by an expert from the University Hospital while staying in the Medical center proximal to their home.
Collapse
|
14
|
Abstract
Freezing tolerance in plants is a complex trait that occurs in many plant species during growth at low, nonfreezing temperatures, a process known as cold acclimation. This process is regulated by a multigenic system expressing broad variation in the degree of freezing tolerance among wheat cultivars. Microarray analysis is a powerful and rapid approach to gene discovery. In species such as wheat, for which large scale mutant screening and transgenic studies are not currently practical, genotype comparison by this methodology represents an essential approach to identifying key genes in the acquisition of freezing tolerance. A microarray was constructed with PCR amplified cDNA inserts from 1184 wheat expressed sequence tags (ESTs) that represent 947 genes. Gene expression during cold acclimation was compared in 2 cultivars with marked differences in freezing tolerance. Transcript levels of more than 300 genes were altered by cold. Among these, 65 genes were regulated differently between the 2 cultivars for at least 1 time point. These include genes that encode potential regulatory proteins and proteins that act in plant metabolism, including protein kinases, putative transcription factors, Ca2+ binding proteins, a Golgi localized protein, an inorganic pyrophosphatase, a cell wall associated hydrolase, and proteins involved in photosynthesis.Key words: wheat microarray, expression profile, plant transcription, cold-regulated genes, freezing tolerance, cold acclimation, winter hardiness, stress genes, gene regulation, wheat transcriptome.
Collapse
|
15
|
Fetal tele-echography using a robotic arm and a satellite link. ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2005; 26:221-6. [PMID: 16116561 DOI: 10.1002/uog.1987] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
OBJECTIVE To design a method for conducting fetal ultrasound examinations in isolated hospital sites using a dedicated remotely controlled robotic arm (tele-echography). METHODS Tele-echography was performed from our hospital (expert center) on 29 pregnant women in an isolated maternity hospital (patient site) 1700 km away, and findings were compared with those of conventional ultrasound examinations. At the patient site, a robotic arm holding the real ultrasound probe was placed on the patient's abdomen by an assistant with no experience of performing ultrasound. The robotic arm, remotely controlled with a fictive (expert) probe, reproduced the exact movements (tilting and rotating) of the expert hand on the real ultrasound probe. RESULTS In 93.1% of the cases, all biometric parameters, placental location and amniotic fluid volume, were correctly assessed using the teleoperated robotic arm. In two cases, femur length could not be correctly measured. The mean duration of fetal ultrasound examination was 14 min (range, 10-18) and 18 min (range, 13-23) by conventional and tele-echography methods, respectively. The mean number of times the robotic arm was repositioned on the patient's abdomen was seven (range, 5-9). CONCLUSION Tele-echography using a robotic arm provides the main information needed to assess fetal growth and the intrauterine environment within a limited period of time.
Collapse
|
16
|
The robot and the satellite for tele-operating echographic examination in Earth isolated sites, or onboard ISS. JOURNAL OF GRAVITATIONAL PHYSIOLOGY : A JOURNAL OF THE INTERNATIONAL SOCIETY FOR GRAVITATIONAL PHYSIOLOGY 2004; 11:P233-4. [PMID: 16240525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
UNLABELLED The objective was to design and validate a method for tele-operating (from an expert site) an echographic examination in an isolated site. METHOD The isolated places, defined as areas with reduced medical facilities, could be secondary hospitals 20 to 50 km from the university hospital, or dispensaries in Africa or Amazonia, or a moving structure like a rescue vehicle or the International Space Station (ISS). At the expert center, the ultrasound medical expert moves a fictive probe, connected to a computer (n degrees 1) which sends, the coordinate changes of this probe via an ISDN or satellite line to a second computer (n degrees 2), located at the isolated site, which applies them to the robotic arm holding the real echographic probe. RESULTS The system was tested at Tours Hospital on 105 patients. A complete investigation (visualization) of all the organs requested for different clinical cases was obtained in 76% of the cases with the robot, and 87% at the reference echography: In 11% of the cases, at least one of the organ visualized at reference echo could not be investigated by the robot, thus the diagnostic was not done. The number of repositioning was higher for the robot (6.5 +/- 2) than for the reference echo (5.1 +/- 2 = or > 24% more with robot). The duration of the examination was higher with the robot (16 +/- 10 min) than for the reference echography (11 +/- 4 min = or > +43% with the robot compare to reference echography. The system was also tested successfully using satellite links in a limited number of cases (approx 30).
Collapse
|
17
|
Functional isolation of the Candida albicans FCR3 gene encoding a bZip transcription factor homologous to Saccharomyces cerevisiae Yap3p. Yeast 2001; 18:1217-25. [PMID: 11561289 DOI: 10.1002/yea.770] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
We have isolated a C. albicans gene, named FCR3 (for fluconazole resistance 3), based upon its ability to suppress the FCZ hypersusceptibility of a Saccharomyces cerevisiae mutant strain (JY312) lacking the transcription factors Pdr1p and Pdr3p. The FCR3 ORF (1200 bp) encodes a 399 amino acid protein containing a basic leucine zipper (bZip) domain. Fcr3p displays the highest level of sequence homology with the S. cerevisiae Yap3p protein (34% identity, 45% similarity). We had previously shown that deletion of the PDR5 gene encoding a multidrug transporter completely abolished the ability of FCR3 to suppress the FCZ hypersusceptibility of JY312, suggesting that FCR3 confers FCZ resistance by activating PDR5 expression. We show here that the beta-galactosidase activity of a PDR5 promoter-lacZ construct in JY312 is increased two-fold upon FCR3 overexpression, demonstrating that FCR3 regulates PDR5 at the transcriptional level. We also show that FCR3 overexpression not only suppresses the hypersusceptibility of JY312 to 4-nitroquinoline-N-oxide (4-NQO) but also confers higher levels of resistance to this compound as compared to the wild-type KY320 strain. Since PDR5 is not involved in 4-NQO resistance, this result indicates that FCR3 can also activate the transcription of other genes that can confer 4-NQO resistance. Finally, Northern blot analysis indicates that FCR3 encodes a single 2.4 kb RNA transcript in C. albicans, suggesting that the FCR3 mRNA contains long 5' and/or 3' untranslated regions. The nucleotide sequence of the FCR3 gene has been deposited at GenBank under Accession No. AF342983.
Collapse
|
18
|
Pseudoknots in prion protein mRNAs confirmed by comparative sequence analysis and pattern searching. Nucleic Acids Res 2001; 29:753-8. [PMID: 11160898 PMCID: PMC30388 DOI: 10.1093/nar/29.3.753] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The human prion gene contains five copies of a 24 nt repeat that is highly conserved among species. An analysis of folding free energies of the human prion mRNA, in particular in the repeat region, suggested biased codon selection and the presence of RNA patterns. In particular, pseudoknots, similar to the one predicted by Wills in the human prion mRNA, were identified in the repeat region of all available prion mRNAs available in GenBank, but not those of birds and the red slider turtle. An alignment of these mRNAs, which share low sequence homology, shows several co-variations that maintain the pseudoknot pattern. The presence of pseudoknots in yeast Sup35p and Rnq1 suggests acquisition in the prokaryotic era. Computer generated three-dimensional structures of the human prion pseudoknot highlight protein and RNA interaction domains, which suggest a possible effect in prion protein translation. The role of pseudoknots in prion diseases is discussed as individuals with extra copies of the 24 nt repeat develop the familial form of Creutzfeldt-Jakob disease.
Collapse
|