Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

90
(from Reference Citation Analysis)

Article PDFs (37)

Cited by > 0 (78)

Searched Name

Joshua W K Ho

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	MetaQuad: shared informative variants discovery in metagenomic samples. BIOINFORMATICS ADVANCES 2024;4:vbae030. [PMID: 38476299 PMCID: PMC10932609 DOI: 10.1093/bioadv/vbae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/08/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024] Abstract Motivation Strain-level analysis of metagenomic data has garnered significant interest in recent years. Microbial single nucleotide polymorphisms (SNPs) are genomic variants that can reflect strain-level differences within a microbial species. The diversity and emergence of SNPs in microbial genomes may reveal evolutionary history and environmental adaptation in microbial populations. However, efficient discovery of shared polymorphic variants in a large collection metagenomic samples remains a computational challenge. Results MetaQuad utilizes a density-based clustering technique to effectively distinguish between shared variants and non-polymorphic sites using shotgun metagenomic data. Empirical comparisons with other state-of-the-art methods show that MetaQuad significantly reduces the number of false positive SNPs without greatly affecting the true positive rate. We used MetaQuad to identify antibiotic-associated variants in patients who underwent Helicobacter pylori eradication therapy. MetaQuad detected 7591 variants across 529 antibiotic resistance genes. The nucleotide diversity of some genes is increased 6 weeks after antibiotic treatment, potentially indicating the role of these genes in specific antibiotic treatments. Availability and implementation MetaQuad is an open-source Python package available via https://github.com/holab-hku/MetaQuad. Collapse Key Words Collapse MESH Headings Collapse Grants Innovation and Technology Commission of Hong Kong Collapse
2	scDecouple: decoupling cellular response from infected proportion bias in scCRISPR-seq. Brief Bioinform 2024;25:bbae011. [PMID: 38324621 PMCID: PMC10849189 DOI: 10.1093/bib/bbae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/18/2023] [Accepted: 01/05/2024] [Indexed: 02/09/2024] Open Abstract Single-cell clustered regularly interspaced short palindromic repeats-sequencing (scCRISPR-seq) is an emerging high-throughput CRISPR screening technology where the true cellular response to perturbation is coupled with infected proportion bias of guide RNAs (gRNAs) across different cell clusters. The mixing of these effects introduces noise into scCRISPR-seq data analysis and thus obstacles to relevant studies. We developed scDecouple to decouple true cellular response of perturbation from the influence of infected proportion bias. scDecouple first models the distribution of gene expression profiles in perturbed cells and then iteratively finds the maximum likelihood of cell cluster proportions as well as the cellular response for each gRNA. We demonstrated its performance in a series of simulation experiments. By applying scDecouple to real scCRISPR-seq data, we found that scDecouple enhances the identification of biologically perturbation-related genes. scDecouple can benefit scCRISPR-seq data analysis, especially in the case of heterogeneous samples or complex gRNA libraries. Collapse Key Words perturbation effects pooled CRISPR-screening single cell statistical model Collapse MESH Headings RNA, Guide, CRISPR-Cas Systems High-Throughput Screening Assays Collapse Grants 2019YFA0906700 National Key R&D Program of China 62373210 National Natural Science Foundation of China 2021Z11JCQ020 Tsinghua University Initiative Scientific Research Program Z210010 Beijing Natural Science Foundation 2020Z99CFG006 Tsinghua University Spring Breeze Fund Innovation and Technology Commission of Hong Kong National Key R&D Program of China Collapse
3	Vulture: cloud-enabled scalable mining of microbial reads in public scRNA-seq data. Gigascience 2024;13:giad117. [PMID: 38195165 PMCID: PMC10776309 DOI: 10.1093/gigascience/giad117] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 08/17/2023] [Accepted: 12/16/2023] [Indexed: 01/11/2024] Open Abstract The rapidly growing collection of public single-cell sequencing data has become a valuable resource for molecular, cellular, and microbial discovery. Previous studies mostly overlooked detecting pathogens in human single-cell sequencing data. Moreover, existing bioinformatics tools lack the scalability to deal with big public data. We introduce Vulture, a scalable cloud-based pipeline that performs microbial calling for single-cell RNA sequencing (scRNA-seq) data, enabling meta-analysis of host-microbial studies from the public domain. In our benchmarking experiments, Vulture is 66% to 88% faster than local tools (PathogenTrack and Venus) and 41% faster than the state-of-the-art cloud-based tool Cumulus, while achieving comparable microbial read identification. In terms of the cost on cloud computing systems, Vulture also shows a cost reduction of 83% ($12 vs. ${\$}$70). We applied Vulture to 2 coronavirus disease 2019, 3 hepatocellular carcinoma (HCC), and 2 gastric cancer human patient cohorts with public sequencing reads data from scRNA-seq experiments and discovered cell type-specific enrichment of severe acute respiratory syndrome coronavirus 2, hepatitis B virus (HBV), and Helicobacter pylori-positive cells, respectively. In the HCC analysis, all cohorts showed hepatocyte-only enrichment of HBV, with cell subtype-associated HBV enrichment based on inferred copy number variations. In summary, Vulture presents a scalable and economical framework to mine unknown host-microbial interactions from large-scale public scRNA-seq data. Vulture is available via an open-source license at https://github.com/holab-hku/Vulture. Collapse Key Words COVID-19 HCC cloud computing single cell virus Collapse MESH Headings Humans Benchmarking Carcinoma, Hepatocellular/genetics DNA Copy Number Variations Hepatitis B virus Liver Neoplasms Single-Cell Gene Expression Analysis Collapse Grants Innovation and Technology Commission - Hong Kong Collapse
4	An edge-device-compatible algorithm for valvular heart diseases screening using phonocardiogram signals with a lightweight convolutional neural network and self-supervised learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024;243:107906. [PMID: 37950925 DOI: 10.1016/j.cmpb.2023.107906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 02/24/2023] [Accepted: 10/27/2023] [Indexed: 11/13/2023] Abstract BACKGROUND AND OBJECTIVES Detection and classification of heart murmur using mobile-phone-collected sound is an emerging approach to the scale-up screening of valvular heart disease at a population level. Nonetheless, the widespread adoption of artificial intelligence (AI) methods for this type of mobile health (mHealth) application requires highly accurate and lightweight AI models that can be deployed in consumer-grade mobile devices. This study presents a lightweight deep learning model and a self-supervised learning (SSL) method to utilise unlabelled data to improve the accuracy of valvular heart disease classification using phonocardiogram data. METHODS This study proposes a lightweight convolutional neural network (CNN) that consists of ten times fewer parameters than other deep learning models to classify phonocardiogram data. SSL is applied to harness a large collection of unlabelled data as pre-training to enhance the accuracy and robustness of the model and reduce the number of epochs required to converge. A mobile application prototype that encapsulates the model is developed to perform in-device inference and fine-turning. RESULTS The proposed lightweight model achieves an average accuracy of 98.65% in 10-fold cross-validation. When coupled with SSL using unlabelled data, the pre-trained model can reach an average accuracy higher than 99.4% in 10-fold cross-validation. Furthermore, SSL-trained models have a 4-20% improvement in classification accuracy over non-SSL-trained models when tested with perturbed or noisy data, suggesting that SSL improves robustness of the model. When deployed on common smartphones, in-device fine-tuning and inference of the model can be completed within 0.03-0.37 s, which is considerably faster than 0.22-5.7 s by a standard CNN model that have ten times the number of parameters. Our lightweight model also consumes only a third of the power compared to the larger standard model. CONCLUSION This work presents a lightweight and accurate phonocardiogram classifier that supports near real-time performance on standard mobile devices. Collapse Key Words Convolutional neural network Deep learning Digital health Mobile health Self-supervised learning Valvular heart diseases screening Collapse MESH Headings Humans Artificial Intelligence Neural Networks, Computer Algorithms Heart Valve Diseases/diagnostic imaging Supervised Machine Learning Collapse Grants Collapse
5	Discovery of regulatory motifs in 5' untranslated regions using interpretable multi-task learning models. Cell Syst 2023;14:1103-1112.e6. [PMID: 38016465 DOI: 10.1016/j.cels.2023.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 09/18/2023] [Accepted: 10/31/2023] [Indexed: 11/30/2023] Abstract The sequence in the 5' untranslated regions (UTRs) is known to affect mRNA translation rates. However, the underlying regulatory grammar remains elusive. Here, we propose MTtrans, a multi-task translation rate predictor capable of learning common sequence patterns from datasets across various experimental techniques. The core premise is that common motifs are more likely to be genuinely involved in translation control. MTtrans outperforms existing methods in both accuracy and the ability to capture transferable motifs across species, highlighting its strength in identifying evolutionarily conserved sequence motifs. Our independent fluorescence-activated cell sorting coupled with deep sequencing (FACS-seq) experiment validates the impact of most motifs identified by MTtrans. Additionally, we introduce "GRU-rewiring," a technique to interpret the hidden states of the recurrent units. Gated recurrent unit (GRU)-rewiring allows us to identify regulatory element-enriched positions and examine the local effects of 5' UTR mutations. MTtrans is a powerful tool for deciphering the translation regulatory motifs. Collapse Key Words eukaryotic translation explainable AI motif discovery multi-task learning sequence modeling Collapse MESH Headings 5' Untranslated Regions/genetics Regulatory Sequences, Nucleic Acid Conserved Sequence Collapse Grants Collapse
6	DCATS: differential composition analysis for flexible single-cell experimental designs. Genome Biol 2023;24:151. [PMID: 37365636 PMCID: PMC10294334 DOI: 10.1186/s13059-023-02980-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 06/07/2023] [Indexed: 06/28/2023] Open Abstract Differential composition analysis - the identification of cell types that have statistically significant changes in abundance between multiple experimental conditions - is one of the most common tasks in single cell omic data analysis. However, it remains challenging to perform differential composition analysis in the presence of flexible experimental designs and uncertainty in cell type assignment. Here, we introduce a statistical model and an open source R package, DCATS, for differential composition analysis based on a beta-binomial regression framework that addresses these challenges. Our empirical evaluation shows that DCATS consistently maintains high sensitivity and specificity compared to state-of-the-art methods. Collapse Key Words Collapse MESH Headings Software Research Design Models, Statistical Single-Cell Analysis/methods Collapse Grants Innovation and Technology Commission - Hong Kong Collapse
7	Altered human gut virome in patients undergoing antibiotics therapy for Helicobacter pylori. Nat Commun 2023;14:2196. [PMID: 37069161 PMCID: PMC10110541 DOI: 10.1038/s41467-023-37975-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 04/04/2023] [Indexed: 04/19/2023] Open Abstract Transient gut microbiota alterations have been reported after antibiotic therapy for Helicobacter pylori. However, alteration in the gut virome after H. pylori eradication remains uncertain. Here, we apply metagenomic sequencing to fecal samples of 44 H. pylori-infected patients at baseline, 6-week (N = 44), and 6-month (N = 33) after treatment. Following H. pylori eradication, we discover contraction of the gut virome diversity, separation of virome community with increased community difference, and shifting towards a higher proportion of core virus. While the gut microbiota is altered at 6-week and restored at 6-month, the virome community shows contraction till 6-month after the treatment with enhanced phage-bacteria interactions at 6-week. Multiple courses of antibiotic treatments further lead to lower virus community diversity when compared with treatment naive patients. Our results demonstrate that H. pylori eradication therapies not only result in transient alteration in gut microbiota but also significantly alter the previously less known gut virome community. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
8	Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads. F1000Res 2022;8:1587. [PMID: 32913631 PMCID: PMC7459848 DOI: 10.12688/f1000research.19426.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/08/2022] [Indexed: 11/25/2022] Open Abstract Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for downstream analyses. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align all the reads which should have been aligned, a problem we termed as the false-negative non-alignment problem. Here we present Scavenger, a python-based bioinformatics pipeline for recovering unaligned reads using a novel mechanism in which a putative alignment location is discovered based on sequence similarity between aligned and unaligned reads. We showed that Scavenger could recover unaligned reads in a range of simulated and real RNA-seq datasets, including single-cell RNA-seq data. We found that recovered reads tend to contain more genetic variants with respect to the reference genome compared to previously aligned reads, indicating that divergence between personal and reference genomes plays a role in the false-negative non-alignment problem. Even when the number of recovered reads is relatively small compared to the total number of reads, the addition of these recovered reads can impact downstream analyses, especially in terms of estimating the expression and differential expression of lowly expressed genes, such as pseudogenes. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
9	Incidence of Emergency Department Visits for Sexual Abuse Among Youth in Hong Kong Before and During the COVID-19 Pandemic. JAMA Netw Open 2022;5:e2236278. [PMID: 36264581 PMCID: PMC9585429 DOI: 10.1001/jamanetworkopen.2022.36278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open Abstract This cohort study assesses the incidence of emergency department (ED) visits in Hong Kong, China, for sexual abuse among youth before and during the COVID-19 pandemic. Collapse Key Words Collapse MESH Headings Humans Adolescent COVID-19/epidemiology Pandemics Incidence Hong Kong/epidemiology Emergency Service, Hospital Sex Offenses Collapse Grants Collapse
10	Biophysical Reviews special issue call: quantitative methods to decipher cellular heterogeneity - from single-cell to spatial omic methods. Biophys Rev 2022;14:1079-1080. [PMID: 36345277 PMCID: PMC9636346 DOI: 10.1007/s12551-022-00994-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 08/07/2022] [Indexed: 10/15/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
11	Molecular determinants of intrinsic cellular stiffness in health and disease. Biophys Rev 2022;14:1197-1209. [PMID: 36345276 PMCID: PMC9636357 DOI: 10.1007/s12551-022-00997-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 09/11/2022] [Indexed: 10/14/2022] Open Abstract In recent years, the role of intrinsic biophysical features, especially cellular stiffness, in diverse cellular and disease processes is being increasingly recognized. New high throughput techniques for the quantification of cellular stiffness facilitate the study of their roles in health and diseases. In this review, we summarized recent discovery about how cellular stiffness is involved in cell stemness, tumorigenesis, and blood diseases. In addition, we review the molecular mechanisms underlying the gene regulation of cellular stiffness in health and disease progression. Finally, we discussed the current understanding on how the cytoskeleton structure and the regulation of these genes contribute to cellular stiffness, highlighting where the field of cellular stiffness is headed. Collapse Key Words Biomechanics Biophysical profiling Cellular stiffness Cytoskeleton Elasticity Collapse MESH Headings Collapse Grants Collapse
12	Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities. Nat Commun 2022;13:2219. [PMID: 35468907 PMCID: PMC9039034 DOI: 10.1038/s41467-022-29874-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 04/04/2022] [Indexed: 12/12/2022] Open Abstract The genome-editing Cas9 protein uses multiple amino-acid residues to bind the target DNA. Considering only the residues in proximity to the target DNA as potential sites to optimise Cas9’s activity, the number of combinatorial variants to screen through is too massive for a wet-lab experiment. Here we generate and cross-validate ten in silico and experimental datasets of multi-domain combinatorial mutagenesis libraries for Cas9 engineering, and demonstrate that a machine learning-coupled engineering approach reduces the experimental screening burden by as high as 95% while enriching top-performing variants by ∼7.5-fold in comparison to the null model. Using this approach and followed by structure-guided engineering, we identify the N888R/A889Q variant conferring increased editing activity on the protospacer adjacent motif-relaxed KKH variant of Cas9 nuclease from Staphylococcus aureus (KKH-SaCas9) and its derived base editor in human cells. Our work validates a readily applicable workflow to enable resource-efficient high-throughput engineering of genome editor’s activity. Screening combinatorial mutants is too massive for wet-lab experiment alone. Here the authors present a machine learning-coupled combinatorial mutagenesis approach to vastly reduce experimental burden for engineering Cas9 genome editing enzymes. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
13	Dynamic changes in antibiotic resistance genes and gut microbiota after Helicobacter pylori eradication therapies. Helicobacter 2022;27:e12871. [PMID: 34969161 DOI: 10.1111/hel.12871] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 12/08/2021] [Accepted: 12/14/2021] [Indexed: 12/19/2022] Abstract BACKGROUND Short-term antibiotics exposure is associated with alterations in microbiota and antibiotic resistance genes (ARGs) in the human gut. While antibiotics are critical in the successful eradication of Helicobacter pylori, the short-term and long-term impacts on the composition and quantity of antibiotics resistance genes after H. pylori eradication are unclear. This study used whole-genome shotgun metagenomic of stool samples to characterize the gut microbiota and ARGs, before and after H. pylori eradication therapy. RESULTS Forty-four H. pylori-infected patients were recruited, including 21 treatment naïve patients who received clarithromycin-based triple therapy (CLA group) and 23 patients who failed previous therapies, in which 10 received levofloxacin-based quadruple therapy (LEVO group) and 13 received other combinations (OTHER group). Stool samples were collected at baseline (before current treatment), 6 week and 6 month after eradication therapy. At baseline, there was only a slight difference among the three groups on ARGs and gut microbiota. After eradication therapy, there was a transient but significant increase in gut ARGs 6 week post-therapy, among which the LEVO group had the most significant ARGs alteration compared to other two groups. For treatment naïve patients, those with higher ErmF abundance were prone to fail CLA eradication and gain more ARGs after treatment. For gut microbiota, the bacteria richness decreased at 6 week and there was a significant difference in microbiota community among the three groups at 6 week. CONCLUSIONS Our findings demonstrated the dynamic alterations in gut microbiota and ARGs induced by different eradication therapies, which could influence the choices of antibiotics in eradication therapy. Collapse Key Words Helicobacter pylori eradication antibiotic resistance genes gut microbiota metagenomic sequencing short-chain fatty acid Collapse MESH Headings Collapse Grants Collapse
14	FlowGrid enables fast clustering of very large single-cell RNA-seq data. Bioinformatics 2021;38:282-283. [PMID: 34289014 DOI: 10.1093/bioinformatics/btab521] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 07/06/2021] [Accepted: 07/18/2021] [Indexed: 02/03/2023] Open Abstract MOTIVATION Scalable clustering algorithms are needed to analyze millions of cells in single cell RNA-seq (scRNA-seq) data. RESULTS Here, we present an open source python package called FlowGrid that can integrate into the Scanpy workflow to perform clustering on very large scRNA-seq datasets. FlowGrid implements a fast density-based clustering algorithm originally designed for flow cytometry data analysis. We introduce a new automated parameter tuning procedure, and show that FlowGrid can achieve comparable clustering accuracy as state-of-the-art clustering algorithms but at a substantially reduced run time for very large single cell RNA-seq datasets. For example, FlowGrid can complete a one-hour clustering task for one million cells in about five min. AVAILABILITY AND IMPLEMENTATION https://github.com/holab-hku/FlowGrid. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
15	Automatic flow delay through passive wax valves for paper-based analytical devices. LAB ON A CHIP 2021;21:4166-4176. [PMID: 34541589 DOI: 10.1039/d1lc00638j] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023] Abstract Microfluidic paper-based analytical devices (μPADs) have been widely explored for point-of-care testing due to their simplicity, low cost, and portability. μPADs with multiple-step reactions usually require precise flow control, especially flow-delay. This paper reports the numerical, mathematical, and experimental studies of flow delay through wax valves surrounded by PDMS walls on paper microfluidics. The predried surfactant in the sample zone diffuses into the liquid sample which can therefore flow through the wax valves. The delay time is automatically regulated by the diffusion of the surfactant after sample loading. The numerical study suggested that both the elevated contact angle and the reduced porosity and pore size in the wax printed region could effectively prevent water but allow liquids with lower contact angles (e.g., surfactant solutions) to flow through. The PDMS walls fabricated using a low-cost liquid dispenser effectively prevented the leakage of surfactant solutions. By controlling the quantity, diffusion distance, and type of the surfactant predried on the chip, the system successfully achieved a delay time ranging from 1.6 to 20 minutes. A mathematical model involving the above parameters was developed based on Fick's second law to predict the delay time. Finally, the flow-delay systems were applied in sequential mixing and distance-based detection of either glucose or alcohol. Linear ranges of 1-100 mg dL^-1 and 1-40 mg dL^-1 were achieved for glucose and alcohol, respectively. The lower limit detection (LOD) of glucose and alcohol was 1 mg dL^-1. The LOD of glucose was only 1/11 of that detected using μPADs without flow control, indicating the advantage of controlling fluid flow. The systematic findings in this study provide critical guidelines for the development and applications of wax valves in automatic flow delay for point-of-care testing. Collapse Key Words Collapse MESH Headings Glucose Lab-On-A-Chip Devices Microfluidic Analytical Techniques Microfluidics Paper Point-of-Care Testing Collapse Grants Collapse
16	Deep Learning for Clinical Image Analyses in Oral Squamous Cell Carcinoma: A Review. JAMA Otolaryngol Head Neck Surg 2021;147:893-900. [PMID: 34410314 DOI: 10.1001/jamaoto.2021.2028] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Abstract Importance Oral squamous cell carcinoma (SCC) is a lethal malignant neoplasm with a high rate of tumor metastasis and recurrence. Accurate diagnosis, prognosis prediction, and metastasis detection can improve patient outcomes. Deep learning for clinical image analysis can be used for diagnosis and prognosis in cancers, including oral SCC; its use in these areas can improve patient care and outcome. Observations This review is a summary of the use of deep learning models for diagnosis, prognosis, and metastasis detection for oral SCC by analyzing information from pathological and radiographic images. Specifically, deep learning has been used to classify different cell types, to differentiate cancer cells from nonmalignant cells, and to identify oral SCC from other cancer types. It can also be used to predict survival, to differentiate between tumor grades, and to detect lymph node metastasis. In general, the performance of these deep learning models has an accuracy ranging from 77.89% to 97.51% and 76% to 94.2% with the use of pathological and radiographic images, respectively. The review also discusses the importance of using good-quality clinical images in sufficient quantity on model performance. Conclusions and Relevance Applying pathological and radiographic images in deep learning models for diagnosis and prognosis of oral SCC has been explored, and most studies report results showing good classification accuracy. The successful use of deep learning in these areas has a high clinical translatability in the improvement of patient care. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
17	Generalized and scalable trajectory inference in single-cell omics data with VIA. Nat Commun 2021;12:5528. [PMID: 34545085 PMCID: PMC8452770 DOI: 10.1038/s41467-021-25773-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 08/31/2021] [Indexed: 11/08/2022] Open Abstract Inferring cellular trajectories using a variety of omic data is a critical task in single-cell data science. However, accurate prediction of cell fates, and thereby biologically meaningful discovery, is challenged by the sheer size of single-cell data, the diversity of omic data types, and the complexity of their topologies. We present VIA, a scalable trajectory inference algorithm that overcomes these limitations by using lazy-teleporting random walks to accurately reconstruct complex cellular trajectories beyond tree-like pathways (e.g., cyclic or disconnected structures). We show that VIA robustly and efficiently unravels the fine-grained sub-trajectories in a 1.3-million-cell transcriptomic mouse atlas without losing the global connectivity at such a high cell count. We further apply VIA to discovering elusive lineages and less populous cell fates missed by other methods across a variety of data types, including single-cell proteomic, epigenomic, multi-omics datasets, and a new in-house single-cell morphological dataset. Collapse Key Words computational models statistical methods Collapse MESH Headings Algorithms Animals Cell Cycle Cell Differentiation Cell Line, Tumor Cell Shape Genomics Hematopoiesis Humans Islets of Langerhans/cytology LIM-Homeodomain Proteins/metabolism Mesoderm/cytology Mice Mouse Embryonic Stem Cells/cytology Organogenesis Single-Cell Analysis Transcription Factors/metabolism Collapse Grants Research Grants Council, University Grants Committee (RGC, UGC) Collapse
18	Genetic screening reveals phospholipid metabolism as a key regulator of the biosynthesis of the redox-active lipid coenzyme Q. Redox Biol 2021;46:102127. [PMID: 34521065 PMCID: PMC8435697 DOI: 10.1016/j.redox.2021.102127] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 08/27/2021] [Accepted: 09/04/2021] [Indexed: 11/30/2022] Open Abstract Mitochondrial energy production and function rely on optimal concentrations of the essential redox-active lipid, coenzyme Q (CoQ). CoQ deficiency results in mitochondrial dysfunction associated with increased mitochondrial oxidative stress and a range of pathologies. What drives CoQ deficiency in many of these pathologies is unknown, just as there currently is no effective therapeutic strategy to overcome CoQ deficiency in humans. To date, large-scale studies aimed at systematically interrogating endogenous systems that control CoQ biosynthesis and their potential utility to treat disease have not been carried out. Therefore, we developed a quantitative high-throughput method to determine CoQ concentrations in yeast cells. Applying this method to the Yeast Deletion Collection as a genome-wide screen, 30 genes not known previously to regulate cellular concentrations of CoQ were discovered. In combination with untargeted lipidomics and metabolomics, phosphatidylethanolamine N-methyltransferase (PEMT) deficiency was confirmed as a positive regulator of CoQ synthesis, the first identified to date. Mechanistically, PEMT deficiency alters mitochondrial concentrations of one-carbon metabolites, characterized by an increase in the S-adenosylmethionine to S-adenosylhomocysteine (SAM-to-SAH) ratio that reflects mitochondrial methylation capacity, drives CoQ synthesis, and is associated with a decrease in mitochondrial oxidative stress. The newly described regulatory pathway appears evolutionary conserved, as ablation of PEMT using antisense oligonucleotides increases mitochondrial CoQ in mouse-derived adipocytes that translates to improved glucose utilization by these cells, and protection of mice from high-fat diet-induced insulin resistance. Our studies reveal a previously unrecognized relationship between two spatially distinct lipid pathways with potential implications for the treatment of CoQ deficiencies, mitochondrial oxidative stress/dysfunction, and associated diseases. • Mitochondrial CoQ deficiency results in oxidative stress and a range of pathologies • The drivers of mitochondrial CoQ deficiency remain largely unknown • PEMT deficiency is the first identified positive regulator of mitochondrial CoQ • PEMT deficiency increases CoQ by increasing the mitochondrial SAM-to-SAH ratio • PEMT deficiency prevents insulin resistance by increasing mitochondrial CoQ Collapse Key Words Coenzyme Q Insulin resistance Mitochondria PEMT Reactive oxygen species S-adenosylhomocysteine S-adenosylmethionine Collapse MESH Headings Collapse Grants Collapse
19	The method to quantify cell elasticity based on the precise measurement of pressure inducing cell deformation in microfluidic channels. MethodsX 2021;8:101247. [PMID: 34434770 PMCID: PMC8374187 DOI: 10.1016/j.mex.2021.101247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 01/20/2021] [Indexed: 01/01/2023] Open Abstract The cell elasticity has attracted extensive research interests since it not only provides new insights into cell biology but also is an emerging mechanical marker for the diagnosis of some diseases. This paper reports the method for the precise measurement of mechanical properties of single cells deformed to a large extent using a novel microfluidic system integrated with a pressure feedback system and small particle separation unit. The particle separation system was employed to avoid the blockage of the cell deformation channel to enhance the measurement throughput. This system is of remarkable application potential in the precise evaluation of cell mechanical properties. In brief, this paper reports:• The manufacturing of the chip using standard soft lithography; • The methods to deform single cells in a microchannel and measure the relevant pressure drop using a pressure sensor connecting to the microfluidic chip; • Calculation of the mechanical properties including stiffness and fluidity of each cell based on a power-law rheology model describing the viscoelastic behaviors of cells; • Automatic and real-time measurement of the mechanical properties using video processing software. Collapse Key Words Cell deformation Cell elasticity Measurement Microfluidic Pressure drop Video processing Collapse MESH Headings Collapse Grants Collapse
20	Machine learning application for the prediction of SARS-CoV-2 infection using blood tests and chest radiograph. Sci Rep 2021;11:14250. [PMID: 34244563 PMCID: PMC8270945 DOI: 10.1038/s41598-021-93719-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 06/21/2021] [Indexed: 01/08/2023] Open Abstract Triaging and prioritising patients for RT-PCR test had been essential in the management of COVID-19 in resource-scarce countries. In this study, we applied machine learning (ML) to the task of detection of SARS-CoV-2 infection using basic laboratory markers. We performed the statistical analysis and trained an ML model on a retrospective cohort of 5148 patients from 24 hospitals in Hong Kong to classify COVID-19 and other aetiology of pneumonia. We validated the model on three temporal validation sets from different waves of infection in Hong Kong. For predicting SARS-CoV-2 infection, the ML model achieved high AUCs and specificity but low sensitivity in all three validation sets (AUC: 89.9-95.8%; Sensitivity: 55.5-77.8%; Specificity: 91.5-98.3%). When used in adjunction with radiologist interpretations of chest radiographs, the sensitivity was over 90% while keeping moderate specificity. Our study showed that machine learning model based on readily available laboratory markers could achieve high accuracy in predicting SARS-CoV-2 infection. Collapse Key Words computational biology and bioinformatics biomarkers health care mathematics and computing Collapse MESH Headings Adolescent Adult Biomarkers/blood COVID-19/blood COVID-19/diagnostic imaging COVID-19 Testing Female Humans Machine Learning Male Middle Aged Models, Biological Predictive Value of Tests Retrospective Studies SARS-CoV-2/metabolism Thorax/diagnostic imaging Collapse Grants Collapse
21	Introduction to the Special Issue on GIW/ABACBS 2019. J Bioinform Comput Biol 2021;18:2002001. [PMID: 32336250 DOI: 10.1142/s0219720020020011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
22	The method to dynamically screen and print single cells using microfluidics with pneumatic microvalves. MethodsX 2021;8:101190. [PMID: 33425688 PMCID: PMC7779779 DOI: 10.1016/j.mex.2020.101190] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open Abstract Printing single cells into individual chambers is of critical importance for single-cell analysis using traditional equipment, for instance, single-cell clonal expansion or sequencing. The size of cells can usually be a reflection of their types, functions, and even cell cycle phases. Therefore, printing individual cells within the desired size range is of essential application potential in single-cell analysis. This paper presents a method for the development of a microfluidic chip integrating pneumatic microvalves to print single cells with appropriate size into standard well plates. The reported method provided essential guidelines for the fabrication of multi-layer microfluidic chips, control of the membrane deflection to screen cell size, and printing of single cells. In brief, this paper reports:• the manufacturing of the chip using standard soft lithography; • the protocol to dynamically screen both the lower and the upper size limit of cells passing through the valves by deflection of the valve membrane; • the screening and dispensing of suspended human umbilical vein endothelial cells (HUVECs) into 384-well plates with high viability. Collapse Key Words Dynamic screening Microfluidics Pneumatic microvalves Printing Single cells Collapse MESH Headings Collapse Grants Collapse
23	Computed tomography-based deep-learning prediction of neoadjuvant chemoradiotherapy treatment response in esophageal squamous cell carcinoma. Radiother Oncol 2021;154:6-13. [PMID: 32941954 DOI: 10.1016/j.radonc.2020.09.014] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Revised: 08/20/2020] [Accepted: 09/06/2020] [Indexed: 02/06/2023] Abstract BACKGROUND Deep learning is promising to predict treatment response. We aimed to evaluate and validate the predictive performance of the CT-based model using deep learning features for predicting pathologic complete response to neoadjuvant chemoradiotherapy (nCRT) in esophageal squamous cell carcinoma (ESCC). MATERIALS AND METHODS Patients were retrospectively enrolled between April 2007 and December 2018 from two institutions. We extracted deep learning features of six pre-trained convolutional neural networks, respectively, from pretreatment CT images in the training cohort (n = 161). Support vector machine was adopted as the classifier. Validation was performed in an external testing cohort (n = 70). We assessed the performance using the area under the receiver operating characteristics curve (AUC) and selected an optimal model, which was compared with a radiomics model developed from the training cohort. A clinical model consisting of clinical factors only was also built for baseline comparison. We further conducted a radiogenomics analysis using gene expression profiles to reveal underlying biology associated with radiological prediction. RESULTS The optimal model with features extracted from ResNet50 achieved an AUC and accuracy of 0.805 (95% CI, 0.696-0.913) and 77.1% (65.6%-86.3%) in the testing cohort, compared with 0.725 (0.605-0.846)) and 67.1% (54.9%-77.9%) for the radiomics model. All the radiological models showed better predictive performance than the clinical model. Radiogenomics analysis suggested a potential association mainly with WNT signaling pathway and tumor microenvironment. CONCLUSIONS The novel and noninvasive deep learning approach could provide efficient and accurate prediction of treatment response to nCRT in ESCC, and benefit clinical decision making of therapeutic strategy. Collapse Key Words Computed tomography Deep learning Esophageal squamous cell carcinoma Neoadjuvant chemoradiotherapy Radiomics Collapse MESH Headings Chemoradiotherapy Deep Learning Esophageal Neoplasms/diagnostic imaging Esophageal Neoplasms/therapy Esophageal Squamous Cell Carcinoma/diagnostic imaging Esophageal Squamous Cell Carcinoma/therapy Head and Neck Neoplasms Humans Neoadjuvant Therapy Retrospective Studies Tomography, X-Ray Computed Tumor Microenvironment Collapse Grants Collapse
24	dv-trio: a family-based variant calling pipeline using DeepVariant. Bioinformatics 2020;36:3549-3551. [PMID: 32315409 DOI: 10.1093/bioinformatics/btaa116] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 01/31/2020] [Accepted: 04/17/2020] [Indexed: 12/17/2022] Open Abstract MOTIVATION In 2018, Google published an innovative variant caller, DeepVariant, which converts pileups of sequence reads into images and uses a deep neural network to identify single-nucleotide variants and small insertion/deletions from next-generation sequencing data. This approach outperforms existing state-of-the-art tools. However, DeepVariant was designed to call variants within a single sample. In disease sequencing studies, the ability to examine a family trio (father-mother-affected child) provides greater power for disease mutation discovery. RESULTS To further improve DeepVariant's variant calling accuracy in family-based sequencing studies, we have developed a family-based variant calling pipeline, dv-trio, which incorporates the trio information from the Mendelian genetic model into variant calling based on DeepVariant. AVAILABILITY AND IMPLEMENTATION dv-trio is available via an open source BSD3 license at GitHub (https://github.com/VCCRI/dv-trio/). CONTACT e.giannoulatou@victorchang.edu.au. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
25	A High-Throughput Genome-Integrated Assay Reveals Spatial Dependencies Governing Tcf7l2 Binding. Cell Syst 2020;11:315-327.e5. [PMID: 32910904 PMCID: PMC7530048 DOI: 10.1016/j.cels.2020.08.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 06/03/2020] [Accepted: 08/04/2020] [Indexed: 12/17/2022] Abstract Predicting where transcription factors bind in the genome from their in vitro DNA-binding affinity is confounded by the large number of possible interactions with nearby transcription factors. To characterize the in vivo binding logic for the Wnt effector Tcf7l2, we developed a high-throughput screening platform in which thousands of synthesized DNA phrases are inserted into a specific genomic locus, followed by measurement of Tcf7l2 binding by DamID. Using this platform at two genomic loci in mouse embryonic stem cells, we show that while the binding of Tcf7l2 closely follows the in vitro motif-binding strength and is influenced by local chromatin accessibility, it is also strongly affected by the surrounding 99 bp of sequence. Through controlled sequence perturbation, we show that Oct4 and Klf4 motifs promote Tcf7l2 binding, particularly in the adjacent ∼50 bp and oscillating with a 10.8-bp phasing relative to these cofactor motifs, which matches the turn of a DNA helix. Collapse Key Words CRISPR-Cas9 DamID Gaussian process Tcf7l2 transcription factor Collapse MESH Headings Binding Sites High-Throughput Screening Assays/methods Humans Kruppel-Like Factor 4 Transcription Factor 7-Like 2 Protein/metabolism Transcription Factors/genetics Collapse Grants K01 DK101684 NIDDK NIH HHS R01 HG008363 NHGRI NIH HHS R01 HG008754 NHGRI NIH HHS R21 OD025309 NIH HHS Collapse
26	Assessment of Intratumoral and Peritumoral Computed Tomography Radiomics for Predicting Pathological Complete Response to Neoadjuvant Chemoradiation in Patients With Esophageal Squamous Cell Carcinoma. JAMA Netw Open 2020;3:e2015927. [PMID: 32910196 PMCID: PMC7489831 DOI: 10.1001/jamanetworkopen.2020.15927] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open Abstract IMPORTANCE For patients with locally advanced esophageal squamous cell carcinoma, neoadjuvant chemoradiation has been shown to improve long-term outcomes, but the treatment response varies among patients. Accurate pretreatment prediction of response remains an urgent need. OBJECTIVE To determine whether peritumoral radiomics features derived from baseline computed tomography images could provide valuable information about neoadjuvant chemoradiation response and enhance the ability of intratumoral radiomics to estimate pathological complete response. DESIGN, SETTING, AND PARTICIPANTS A total of 231 patients with esophageal squamous cell carcinoma, who underwent baseline contrast-enhanced computed tomography and received neoadjuvant chemoradiation followed by surgery at 2 institutions in China, were consecutively included. This diagnostic study used single-institution data between April 2007 and December 2018 to extract radiomics features from intratumoral and peritumoral regions and established intratumoral, peritumoral, and combined radiomics models using different classifiers. External validation was conducted using independent data collected from another hospital during the same period. Radiogenomics analysis using gene expression profile was done in a subgroup of the training set for pathophysiological explanation. Data were analyzed from June to December 2019. EXPOSURES Computed tomography-based radiomics. MAIN OUTCOMES AND MEASURES The discriminative performances of radiomics models were measured by area under the receiver operating characteristic curve. RESULTS Among the 231 patients included (192 men [83.1%]; mean [SD] age, 59.8 [8.7] years), the optimal intratumoral and peritumoral radiomics models yielded similar areas under the receiver operating characteristic curve of 0.730 (95% CI, 0.609-0.850) and 0.734 (0.613-0.854), respectively. The combined model was composed of 7 intratumoral and 6 peritumoral features and achieved better discriminative performance, with an area under the receiver operating characteristic curve of 0.852 (95% CI, 0.753-0.951), accuracy of 84.3%, sensitivity of 90.3%, and specificity of 79.5% in the test set. Gene sets associated with the combined model mainly involved lymphocyte-mediated immunity. The association of peritumoral area with response identification might be partially attributed to type I interferon-related biological process. CONCLUSIONS AND RELEVANCE A combination of peritumoral radiomics features appears to improve the predictive performance of intratumoral radiomics to estimate pathological complete response after neoadjuvant chemoradiation in patients with esophageal squamous cell carcinoma. This study underlines the significant application of peritumoral radiomics to assess treatment response in clinical practice. Collapse Key Words Collapse MESH Headings Adult Area Under Curve Esophageal Neoplasms/complications Esophageal Neoplasms/therapy Female Hong Kong Humans Male Middle Aged Neoadjuvant Therapy/methods Neoadjuvant Therapy/standards Neoadjuvant Therapy/statistics & numerical data Neoplasms, Squamous Cell/complications Neoplasms, Squamous Cell/therapy Polymerase Chain Reaction/methods ROC Curve Tomography, X-Ray Computed Collapse Grants Collapse
27	Challenges and emerging systems biology approaches to discover how the human gut microbiome impact host physiology. Biophys Rev 2020;12:851-863. [PMID: 32638331 PMCID: PMC7429608 DOI: 10.1007/s12551-020-00724-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/02/2020] [Indexed: 02/07/2023] Open Abstract Research in the human gut microbiome has bloomed with advances in next generation sequencing (NGS) and other high-throughput molecular profiling technologies. This has enabled the generation of multi-omics datasets which holds promises for big data-enabled knowledge acquisition in the form of understanding the normal physiological and pathological involvement of gut microbiomes. Ample evidence suggests that distinct microbial compositions in the human gut are associated with different diseases. However, the biological mechanisms underlying these associations are often unclear. There is a need to move beyond statistical associations to discover how changes in the gut microbiota mechanistically affect host physiology and disease development. This review summarises state-of-the-art big data and systems biology approaches for mechanism discovery. Collapse Key Words Big data Ecological modelling Metagenomic Microbiome Systems biology Collapse MESH Headings Collapse Grants Hong Kong PhD Fellowship, Research Grant Council of Hong Kong Hong Kong Jockey Club Charity Trust Collapse
28	Biophysical Review's 'meet the editors series'-a profile of Joshua W. K. Ho. Biophys Rev 2020;12:745-748. [PMID: 32725478 DOI: 10.1007/s12551-020-00744-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/03/2020] [Indexed: 01/19/2023] Open Abstract It is my pleasure to write a few words to introduce myself to the readers of Biophysical Reviews as part of the 'meet the editors' series. A portrait of Dr. Joshua Ho. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
29	Hactive: a smartphone application for heart rate profiling. Biophys Rev 2020;12:777-779. [PMID: 32666466 DOI: 10.1007/s12551-020-00731-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 07/06/2020] [Indexed: 11/30/2022] Open Abstract With advancements in popular modern wearable devices, such as Apple Watch and Fitbit, it is now possible to harness these technologies for continuous monitoring and recording of heart rate data, which can then be used for medical research and ultimately e-health applications. In this paper, we report the development of a new mobile smartphone application (app) that enables heart rate profiles to be extracted and analysed from continuous heart rate monitoring time series. The new iOS app, called Hactive, extracts heart rate data from Apple's smartwatches to construct heart rate profiles. A key innovation is Hactive's ability to detect and analyse exercise-associated heart rate changes from continuous heart rate data, which enables heart rate profiles to be constructed based on free-living conditions. We believe this tool advances the use of wearable technology to collect physiologically relevant big data for healthcare and medical research. The source code of Hactive is available via an MIT open source licence at https://github.com/VCCRI/hactive . Collapse Key Words Big data Cardiovascular health Internet of things mHealth Collapse MESH Headings Collapse Grants Collapse
30	Ularcirc: visualization and enhanced analysis of circular RNAs via back and canonical forward splicing. Nucleic Acids Res 2020;47:e123. [PMID: 31435647 PMCID: PMC6846653 DOI: 10.1093/nar/gkz718] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 07/23/2019] [Accepted: 08/08/2019] [Indexed: 01/22/2023] Open Abstract Circular RNAs (circRNA) are a unique class of transcripts that can only be identified from sequence alignments spanning discordant junctions, commonly referred to as backsplice junctions (BSJ). Canonical splicing is also linked with circRNA biogenesis either from the parental transcript or internal to the circRNA, and is not fully utilized in circRNA software. Here we present Ularcirc, a software tool that integrates the visualization of both BSJ and forward splicing junctions and provides downstream analysis of selected circRNA candidates. Ularcirc utilizes the output of CIRI, circExplorer, or raw chimeric output of the STAR aligner and assembles BSJ count table to allow multi-sample analysis. We used Ularcirc to identify and characterize circRNA from public and in-house generated data sets and demonstrate how it can be used to (i) discover novel splicing patterns of parental transcripts, (ii) detect internal splicing patterns of circRNA, and (iii) reveal the complexity of BSJ formation. Furthermore, we identify circRNA that have potential open reading frames longer than their linear sequence. Finally, we detected and validated the presence of a novel class of circRNA generated from ApoA4 transcripts whose BSJ derive from multiple non-canonical splicing sites within coding exons. Ularcirc is accessed via https://github.com/VCCRI/Ularcirc. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
31	PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells. Bioinformatics 2020;36:2778-2786. [PMID: 31971583 PMCID: PMC7203756 DOI: 10.1093/bioinformatics/btaa042] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 11/24/2019] [Accepted: 01/16/2020] [Indexed: 12/13/2022] Open Abstract MOTIVATION New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. RESULTS We introduce a highly scalable graph-based clustering algorithm PARC-Phenotyping by Accelerated Refined Community-partitioning-for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. AVAILABILITY AND IMPLEMENTATION https://github.com/ShobiStassen/PARC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Algorithms Cluster Analysis RNA-Seq Single-Cell Analysis Software Exome Sequencing Collapse Grants Research Grants Council Hong Kong Special Administrative Region of China Collaborative Research Fund General Research Fund Innovation and Technology Support Programme Collapse
32	Multi-omic profiling reveals associations between the gut mucosal microbiome, the metabolome, and host DNA methylation associated gene expression in patients with colorectal cancer. BMC Microbiol 2020;20:83. [PMID: 32321427 PMCID: PMC7178946 DOI: 10.1186/s12866-020-01762-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 03/23/2020] [Indexed: 12/24/2022] Open Abstract Background The human gut microbiome plays a critical role in the carcinogenesis of colorectal cancer (CRC). However, a comprehensive analysis of the interaction between the host and microbiome is still lacking. Results We found correlations between the change in abundance of microbial taxa, butyrate-related colonic metabolites, and methylation-associated host gene expression in colonic tumour mucosa tissues compared with the adjacent normal mucosa tissues. The increase of genus Fusobacterium abundance was correlated with a decrease in the level of 4-hydroxybutyric acid (4-HB) and expression of immune-related peptidase inhibitor 16 (PI16), Fc Receptor Like A (FCRLA) and Lymphocyte Specific Protein 1 (LSP1). The decrease in the abundance of another potentially 4-HB-associated genus, Prevotella 2, was also found to be correlated with the down-regulated expression of metallothionein 1 M (MT1M). Additionally, the increase of glutamic acid-related family Halomonadaceae was correlated with the decreased expression of reelin (RELN). The decreased abundance of genus Paeniclostridium and genus Enterococcus were correlated with increased lactic acid level, and were also linked to the expression change of Phospholipase C Beta 1 (PLCB1) and Immunoglobulin Superfamily Member 9 (IGSF9) respectively. Interestingly, 4-HB, glutamic acid and lactic acid are all butyrate precursors, which may modify gene expression by epigenetic regulation such as DNA methylation. Conclusions Our study identified associations between previously reported CRC-related microbial taxa, butyrate-related metabolites and DNA methylation-associated gene expression in tumour and normal colonic mucosa tissues from CRC patients, which uncovered a possible mechanism of the role of microbiome in the carcinogenesis of CRC. In addition, these findings offer insight into potential new biomarkers, therapeutic and/or prevention strategies for CRC. Collapse Key Words Butyrate Colorectal cancer DNA methylation Metabolome Mucosal microbiome Transcriptome Collapse MESH Headings Collapse Grants Collapse
33	Cellular diversity and lineage trajectory: insights from mouse single cell transcriptomes. Development 2020;147:147/2/dev179788. [PMID: 31980483 DOI: 10.1242/dev.179788] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Abstract Single cell RNA-sequencing (scRNA-seq) technology has matured to the point that it is possible to generate large single cell atlases of developing mouse embryos. These atlases allow the dissection of developmental cell lineages and molecular changes during embryogenesis. When coupled with single cell technologies for profiling the chromatin landscape, epigenome, proteome and metabolome, and spatial tissue organisation, these scRNA-seq approaches can now collect a large volume of multi-omic data about mouse embryogenesis. In addition, advances in computational techniques have enabled the inference of developmental lineages of differentiating cells, even without explicitly introduced genetic markers. This Spotlight discusses recent advent of single cell experimental and computational methods, and key insights from applying these methods to the study of mouse embryonic development. We highlight challenges in analysing and interpreting these data to complement and expand our knowledge from traditional developmental biology studies in relation to cell identity, diversity and lineage differentiation. Collapse Key Words Bioinformatics Cell lineages Developmental trajectory Embryo cell atlas Single cell analytics Collapse MESH Headings Collapse Grants Collapse
34	Cloud accelerated alignment and assembly of full-length single-cell RNA-seq data using Falco. BMC Genomics 2019;20:927. [PMID: 31888474 PMCID: PMC6936136 DOI: 10.1186/s12864-019-6341-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 11/26/2019] [Indexed: 12/18/2022] Open Abstract BACKGROUND Read alignment and transcript assembly are the core of RNA-seq analysis for transcript isoform discovery. Nonetheless, current tools are not designed to be scalable for analysis of full-length bulk or single cell RNA-seq (scRNA-seq) data. The previous version of our cloud-based tool Falco only focuses on RNA-seq read counting, but does not allow for more flexible steps such as alignment and read assembly. RESULTS The Falco framework can harness the parallel and distributed computing environment in modern cloud platforms to accelerate read alignment and transcript assembly of full-length bulk RNA-seq and scRNA-seq data. There are two new modes in Falco: alignment-only and transcript assembly. In the alignment-only mode, Falco can speed up the alignment process by 2.5-16.4x based on two public scRNA-seq datasets when compared to alignment on a highly optimised standalone computer. Furthermore, it also provides a 10x average speed-up compared to alignment using published cloud-enabled tool for read alignment, Rail-RNA. In the transcript assembly mode, Falco can speed up the transcript assembly process by 1.7-16.5x compared to performing transcript assembly on a highly optimised computer. CONCLUSION Falco is a significantly updated open source big data processing framework that enables scalable and accelerated alignment and assembly of full-length scRNA-seq data on the cloud. The source code can be found at https://github.com/VCCRI/Falco. Collapse Key Words Alignment Cloud computing Falco Single-cell RNA-seq Transcript assembly Collapse MESH Headings Collapse Grants Collapse
35	Comparison of somatic variant detection algorithms using Ion Torrent targeted deep sequencing data. BMC Med Genomics 2019;12:181. [PMID: 31874647 PMCID: PMC6929331 DOI: 10.1186/s12920-019-0636-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 11/25/2019] [Indexed: 12/20/2022] Open Abstract Background The application of next-generation sequencing in cancer has revealed the genomic landscape of many tumour types and is nowadays routinely used in research and clinical settings. Multiple algorithms have been developed to detect somatic variation from sequencing data using either paired tumour-blood or tumour-only samples. Most of these methods have been developed and evaluated for the identification of somatic variation using Illumina sequencing datasets of moderate coverage. However, a comprehensive evaluation of somatic variant detection algorithms on Ion Torrent targeted deep sequencing data has not been performed. Methods We have applied three somatic detection algorithms, Torrent Variant Caller, MuTect2 and VarScan2, on a large cohort of ovarian cancer patients comprising of 208 paired tumour-blood samples and 253 tumour-only samples sequenced deeply on Ion Torrent Proton platform across 330 amplicons. Subsequently, the concordance and performance of the three somatic variant callers were assessed. Results We have observed low concordance across the algorithms with only 0.5% of SNV and 0.02% of INDEL calls in common across all three methods. The intersection of all methods showed better performance when assessed using correlation with known mutational signatures, overlap with COSMIC variation and by examining the variant characteristics. The Torrent Variant Caller also performed well with the advantage of not eliminating a high number of variants that could lead to high type II error. Conclusions Our results suggest that caution should be taken when applying state-of-the-art somatic variant algorithms to Ion Torrent targeted deep sequencing data. Better quality control procedures and strategies that combine results from multiple methods should ensure that higher accuracy is achieved. This is essential to ensure that results from bioinformatics pipelines using Ion Torrent deep sequencing can be robustly applied in cancer research and in the clinic. Collapse Key Words Cancer genome Ion torrent deep sequencing Methods evaluation Mutational signature Read depth Somatic variant calling Collapse MESH Headings Collapse Grants Collapse
36	iSyTE 2.0: a database for expression-based gene discovery in the eye. Nucleic Acids Res 2019;46:D875-D885. [PMID: 29036527 PMCID: PMC5753381 DOI: 10.1093/nar/gkx837] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 09/11/2017] [Indexed: 12/20/2022] Open Abstract Although successful in identifying new cataract-linked genes, the previous version of the database iSyTE (integrated Systems Tool for Eye gene discovery) was based on expression information on just three mouse lens stages and was functionally limited to visualization by only UCSC-Genome Browser tracks. To increase its efficacy, here we provide an enhanced iSyTE version 2.0 (URL: http://research.bioinformatics.udel.edu/iSyTE) based on well-curated, comprehensive genome-level lens expression data as a one-stop portal for the effective visualization and analysis of candidate genes in lens development and disease. iSyTE 2.0 includes all publicly available lens Affymetrix and Illumina microarray datasets representing a broad range of embryonic and postnatal stages from wild-type and specific gene-perturbation mouse mutants with eye defects. Further, we developed a new user-friendly web interface for direct access and cogent visualization of the curated expression data, which supports convenient searches and a range of downstream analyses. The utility of these new iSyTE 2.0 features is illustrated through examples of established genes associated with lens development and pathobiology, which serve as tutorials for its application by the end-user. iSyTE 2.0 will facilitate the prioritization of eye development and disease-linked candidate genes in studies involving transcriptomics or next-generation sequencing data, linkage analysis and GWAS approaches. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
37	Dam mutants provide improved sensitivity and spatial resolution for profiling transcription factor binding. Epigenetics Chromatin 2019;12:36. [PMID: 31196130 PMCID: PMC6567924 DOI: 10.1186/s13072-019-0273-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 04/23/2019] [Indexed: 11/17/2022] Open Abstract DamID, in which a protein of interest is fused to Dam methylase, enables mapping of protein-DNA binding through readout of adenine methylation in genomic DNA. DamID offers a compelling alternative to chromatin immunoprecipitation sequencing (ChIP-Seq), particularly in cases where cell number or antibody availability is limiting. This comes at a cost, however, of high non-specific signal and a lowered spatial resolution of several kb, limiting its application to transcription factor-DNA binding. Here we show that mutations in Dam, when fused to the transcription factor Tcf7l2, greatly reduce non-specific methylation. Combined with a simplified DamID sequencing protocol, we find that these Dam mutants allow for accurate detection of transcription factor binding at a sensitivity and spatial resolution closely matching that seen in ChIP-seq. Collapse Key Words Dam DamID Tcf7l2 Transcription factor Collapse MESH Headings Collapse Grants Collapse
38	Ultrafast clustering of single-cell flow cytometry data using FlowGrid. BMC SYSTEMS BIOLOGY 2019;13:35. [PMID: 30953498 PMCID: PMC6449887 DOI: 10.1186/s12918-019-0690-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Abstract Background Flow cytometry is a popular technology for quantitative single-cell profiling of cell surface markers. It enables expression measurement of tens of cell surface protein markers in millions of single cells. It is a powerful tool for discovering cell sub-populations and quantifying cell population heterogeneity. Traditionally, scientists use manual gating to identify cell types, but the process is subjective and is not effective for large multidimensional data. Many clustering algorithms have been developed to analyse these data but most of them are not scalable to very large data sets with more than ten million cells. Results Here, we present a new clustering algorithm that combines the advantages of density-based clustering algorithm DBSCAN with the scalability of grid-based clustering. This new clustering algorithm is implemented in python as an open source package, FlowGrid. FlowGrid is memory efficient and scales linearly with respect to the number of cells. We have evaluated the performance of FlowGrid against other state-of-the-art clustering programs and found that FlowGrid produces similar clustering results but with substantially less time. For example, FlowGrid is able to complete a clustering task on a data set of 23.6 million cells in less than 12 seconds, while other algorithms take more than 500 seconds or get into error. Conclusions FlowGrid is an ultrafast clustering algorithm for large single-cell flow cytometry data. The source code is available at https://github.com/VCCRI/FlowGrid. Collapse Key Words Clustering DBSCAN Flow cytometry Single cell Collapse MESH Headings Collapse Grants Collapse
39	Discovery of perturbation gene targets via free text metadata mining in Gene Expression Omnibus. Comput Biol Chem 2019;80:152-158. [PMID: 30959271 DOI: 10.1016/j.compbiolchem.2019.03.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 03/23/2019] [Indexed: 10/27/2022] Abstract There exists over 2.5 million publicly available gene expression samples across 101,000 data series in NCBI's Gene Expression Omnibus (GEO) database. Due to the lack of the use of standardised ontology terms in GEO's free text metadata to annotate the experimental type and sample type, this database remains difficult to harness computationally without significant manual intervention. In this work, we present an interactive R/Shiny tool called GEOracle that utilises text mining and machine learning techniques to automatically identify perturbation experiments, group treatment and control samples and perform differential expression. We present applications of GEOracle to discover conserved signalling pathway target genes and identify an organ specific gene regulatory network. GEOracle is effective in discovering perturbation gene targets in GEO by harnessing its free text metadata. Its effectiveness and applicability has been demonstrated by cross validation and two real-life case studies. It opens up new avenues to unlock the gene regulatory information embedded inside large biological databases such as GEO. GEOracle is available at https://github.com/VCCRI/GEOracle. Collapse Key Words Free text metadata Gene perturbation Machine learning R/shiny Collapse MESH Headings Collapse Grants Collapse
40	Host and microbiome multi-omics integration: applications and methodologies. Biophys Rev 2019;11:55-65. [PMID: 30627872 PMCID: PMC6381360 DOI: 10.1007/s12551-018-0491-7] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Accepted: 12/06/2018] [Indexed: 12/13/2022] Open Abstract The study of the microbial community-the microbiome-associated with a human host is a maturing research field. It is increasingly clear that the composition of the human's microbiome is associated with various diseases such as gastrointestinal diseases, liver diseases and metabolic diseases. Using high-throughput technologies such as next-generation sequencing and mass spectrometry-based metabolomics, we are able to comprehensively sequence the microbiome-the metagenome-and associate these data with the genomic, epigenomics, transcriptomic and metabolic profile of the host. Our review summarises the application of integrating host omics with microbiome as well as the analytical methods and related tools applied in these studies. In addition, potential future directions are discussed. Collapse Key Words Big data Epigenome Genome Metabolome Microbiome Network analysis Transcriptome Collapse MESH Headings Collapse Grants 81330011, 81330014, 81790631, 81790633, 81570512, and 81121002 National Natural Science Foundation of China 81721091 Science Fund for Creative Research Groups 100848 National Heart Foundation of Australia 101204 National Heart Foundation of Australia Collapse
41	CardiacProfileR: an R package for extraction and visualisation of heart rate profiles from wearable fitness trackers. Biophys Rev 2019;11:119-121. [PMID: 30666509 DOI: 10.1007/s12551-019-00498-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 01/07/2019] [Indexed: 11/28/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
42	Identification of active signaling pathways by integrating gene expression and protein interaction data. BMC SYSTEMS BIOLOGY 2018;12:120. [PMID: 30598083 PMCID: PMC6311899 DOI: 10.1186/s12918-018-0655-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Abstract Background Signaling pathways are the key biological mechanisms that transduce extracellular signals to affect transcription factor mediated gene regulation within cells. A number of computational methods have been developed to identify the topological structure of a specific signaling pathway using protein-protein interaction data, but they are not designed for identifying active signaling pathways in an unbiased manner. On the other hand, there are statistical methods based on gene sets or pathway data that can prioritize likely active signaling pathways, but they do not make full use of active pathway structure that link receptor, kinases and downstream transcription factors. Results Here, we present a method to simultaneously predict the set of active signaling pathways, together with their pathway structure, by integrating protein-protein interaction network and gene expression data. We evaluated the capacity for our method to predict active signaling pathways for dental epithelial cells, ocular lens epithelial cells, human pluripotent stem cell-derived lens epithelial cells, and lens fiber cells. This analysis showed our approach could identify all the known active pathways that are associated with tooth formation and lens development. Conclusions The results suggest that SPAGI can be a useful approach to identify the potential active signaling pathways given a gene expression profile. Our method is implemented as an open source R package, available via https://github.com/VCCRI/SPAGI/. Electronic supplementary material The online version of this article (10.1186/s12918-018-0655-x) contains supplementary material, which is available to authorized users. Collapse Key Words Dental epithelial cells Gene expression Lens epithelial cells Lens fiber cells Pluripotent stem cells Protein-protein interaction ROR1+ cells Signaling pathway Collapse MESH Headings Collapse Grants Collapse
43	C3: An R package for cross-species compendium-based cell-type identification. Comput Biol Chem 2018;77:187-192. [PMID: 30340080 DOI: 10.1016/j.compbiolchem.2018.10.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 08/29/2018] [Accepted: 10/04/2018] [Indexed: 01/19/2023] Abstract Cell type identification from an unknown sample can often be done by comparing its gene expression profile against a gene expression database containing profiles of a large number of cell-types. This type of compendium-based cell-type identification strategy is particularly successful for human and mouse samples because a large volume of data exists for these organisms. However, such rich data repositories often do not exist for most non-model organisms. This makes transcriptome-based sample classification in these species challenging. We propose to overcome this challenge by performing a cross-species compendium comparison. The key is to utilise a recently published cross-species gene set analysis (XGSA) framework to correct for biases that may arise due to potentially complex homologous gene mapping between two species. The framework is implemented as an open source R package called C3. We have evaluated the performance of C3 using a variety of public data in NCBI Gene Expression Omnibus. We also compared the functionality and performance of C3 against some similar gene expression profile matching tools. Our evaluation shows that C3 is a simple and effective method for cell type identification. C3 is available at https://github.com/VCCRI/C3. Collapse Key Words Bioinformatics Cell type identification Cross-species Gene set analysis Transcriptomics Collapse MESH Headings Collapse Grants Collapse
44	Identification of clinically actionable variants from genome sequencing of families with congenital heart disease. Genet Med 2018;21:1111-1120. [DOI: 10.1038/s41436-018-0296-x] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2018] [Accepted: 08/28/2018] [Indexed: 12/20/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
45	Integrative analysis identifies co-dependent gene expression regulation of BRG1 and CHD7 at distal regulatory sites in embryonic stem cells. Bioinformatics 2018;33:1916-1920. [PMID: 28203701 DOI: 10.1093/bioinformatics/btx092] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 02/08/2017] [Indexed: 12/11/2022] Open Abstract Motivation DNA binding proteins such as chromatin remodellers, transcription factors (TFs), histone modifiers and co-factors often bind cooperatively to activate or repress their target genes in a cell type-specific manner. Nonetheless, the precise role of cooperative binding in defining cell-type identity is still largely uncharacterized. Results Here, we collected and analyzed 214 public datasets representing chromatin immunoprecipitation followed by sequencing (ChIP-Seq) of 104 DNA binding proteins in embryonic stem cell (ESC) lines. We classified their binding sites into those proximal to gene promoters and those in distal regions, and developed a web resource called Proximal And Distal (PAD) clustering to identify their co-localization at these respective regions. Using this extensive dataset, we discovered an extensive co-localization of BRG1 and CHD7 at distal but not proximal regions. The comparison of co-localization sites to those bound by either BRG1 or CHD7 alone showed an enrichment of ESC master TFs binding and active chromatin architecture at co-localization sites. Most notably, our analysis reveals the co-dependency of BRG1 and CHD7 at distal regions on regulating expression of their common target genes in ESC. This work sheds light on cooperative binding of TF binding proteins in regulating gene expression in ESC, and demonstrates the utility of integrative analysis of a manually curated compendium of genome-wide protein binding profiles in our online resource PAD. Availability and Implementation PAD is freely available at http://pad.victorchang.edu.au/ and its source code is available via an open source GPL 3.0 license at https://github.com/VCCRI/PAD/. Contact pengyi.yang@sydney.edu.au or j.ho@victorchang.edu.au. Supplementary information Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
46	A Screening Approach to Identify Clinically Actionable Variants Causing Congenital Heart Disease in Exome Data. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2018;11:e001978. [PMID: 29555671 DOI: 10.1161/circgen.117.001978] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 01/18/2018] [Indexed: 01/19/2023] Abstract BACKGROUND Congenital heart disease (CHD)-structural abnormalities of the heart that arise during embryonic development-is the most common inborn malformation, affecting ≤1% of the population. However, currently, only a minority of cases can be explained by genetic abnormalities. The goal of this study was to identify disease-causal genetic variants in 30 families affected by CHD. METHODS Whole-exome sequencing was performed with the DNA of multiple family members. We utilized a 2-tiered whole-exome variant screening and interpretation procedure. First, we manually curated a high-confidence list of 90 genes known to cause CHD in humans, identified predicted damaging variants in genes on this list, and rated their pathogenicity using American College of Medical Genetics and Genomics-Association for Molecular Pathology guidelines. RESULTS In 3 families (10%), we found pathogenic variants in known CHD genes TBX5, TFAP2B, and PTPN11, explaining the cardiac lesions. Second, exomes were comprehensively analyzed to identify additional predicted damaging variants that segregate with disease in CHD candidate genes. In 10 additional families (33%), likely disease-causal variants were uncovered in PBX1, CNOT1, ZFP36L2, TEK, USP34, UPF2, KDM5A, KMT2C, TIE1, TEAD2, and FLT4. CONCLUSIONS The pathogenesis of CHD could be explained using our high-confidence CHD gene list for variant filtering in a subset of cases. Furthermore, our unbiased screening procedure of family exomes implicates additional genes and variants in the pathogenesis of CHD, which suggest themselves for functional validation. This 2-tiered approach provides a means of (1) identifying clinically actionable variants and (2) identifying additional disease-causal genes, both of which are essential for improving the molecular diagnosis of CHD. Collapse Key Words exome genetic testing genetic variation heart diseases humans Collapse MESH Headings Exome/genetics Female Genetic Testing Genetic Variation Heart Defects, Congenital/diagnosis Heart Defects, Congenital/genetics Humans Male Polymorphism, Single Nucleotide Pre-B-Cell Leukemia Transcription Factor 1/genetics Protein Tyrosine Phosphatase, Non-Receptor Type 11/genetics T-Box Domain Proteins/genetics Transcription Factor AP-2/genetics Exome Sequencing Collapse Grants Collapse
47	Falco: a quick and flexible single-cell RNA-seq processing framework on the cloud. Bioinformatics 2018;33:767-769. [PMID: 28025200 DOI: 10.1093/bioinformatics/btw732] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 11/16/2016] [Indexed: 11/13/2022] Open Abstract Summary Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellization of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data. Using two public scRNA-seq datasets and two popular RNA-seq alignment/feature quantification pipelines, we show that the same processing pipeline runs 2.6-145.4 times faster using Falco than running on a highly optimized standalone computer. Falco also allows users to utilize low-cost spot instances of Amazon Web Services, providing a ∼65% reduction in cost of analysis. Availability and Implementation Falco is available via a GNU General Public License at https://github.com/VCCRI/Falco/. Contact j.ho@victorchang.edu.au. Supplementary information Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
48	Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest. BMC Genomics 2018;19:929. [PMID: 29363433 PMCID: PMC5780765 DOI: 10.1186/s12864-017-4340-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open Abstract Background It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs – a motif grammar – located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. Results We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Conclusions Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific. Electronic supplementary material The online version of this article (10.1186/s12864-017-4340-z) contains supplementary material, which is available to authorized users. Collapse Key Words Cell-type specificity Cis-regulatory element DNA motif Random Forest Transcription factor Collapse MESH Headings Collapse Grants Collapse
49	Light-focusing human micro-lenses generated from pluripotent stem cells model lens development and drug-induced cataract in vitro. Development 2018;145:dev.155838. [PMID: 29217756 PMCID: PMC5825866 DOI: 10.1242/dev.155838] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Accepted: 11/15/2017] [Indexed: 12/14/2022] Abstract Cataracts cause vision loss and blindness by impairing the ability of the ocular lens to focus light onto the retina. Various cataract risk factors have been identified, including drug treatments, age, smoking and diabetes. However, the molecular events responsible for these different forms of cataract are ill-defined, and the advent of modern cataract surgery in the 1960s virtually eliminated access to human lenses for research. Here, we demonstrate large-scale production of light-focusing human micro-lenses from spheroidal masses of human lens epithelial cells purified from differentiating pluripotent stem cells. The purified lens cells and micro-lenses display similar morphology, cellular arrangement, mRNA expression and protein expression to human lens cells and lenses. Exposing the micro-lenses to the emergent cystic fibrosis drug Vx-770 reduces micro-lens transparency and focusing ability. These human micro-lenses provide a powerful and large-scale platform for defining molecular disease mechanisms caused by cataract risk factors, for anti-cataract drug screening and for clinically relevant toxicity assays. Collapse Key Words Cataract Focus Lens development Organoid Stem cell Vx-770 Collapse MESH Headings Collapse Grants Collapse
50	Identification of satellite cells from anole lizard skeletal muscle and demonstration of expanded musculoskeletal potential. Dev Biol 2017;433:344-356. [PMID: 29291980 DOI: 10.1016/j.ydbio.2017.08.037] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 08/22/2017] [Accepted: 08/29/2017] [Indexed: 10/18/2022] Abstract The lizards are evolutionarily the closest vertebrates to humans that demonstrate the ability to regenerate entire appendages containing cartilage, muscle, skin, and nervous tissue. We previously isolated PAX7-positive cells from muscle of the green anole lizard, Anolis carolinensis, that can differentiate into multinucleated myotubes and express the muscle structural protein, myosin heavy chain. Studying gene expression in these satellite/progenitor cell populations from A. carolinensis can provide insight into the mechanisms regulating tissue regeneration. We generated a transcriptome from proliferating lizard myoprogenitor cells and compared them to transcriptomes from the mouse and human tissues from the ENCODE project using XGSA, a statistical method for cross-species gene set analysis. These analyses determined that the lizard progenitor cell transcriptome was most similar to mammalian satellite cells. Further examination of specific GO categories of genes demonstrated that among genes with the highest level of expression in lizard satellite cells were an increased number of genetic regulators of chondrogenesis, as compared to mouse satellite cells. In micromass culture, lizard PAX7-positive cells formed Alcian blue and collagen 2a1 positive nodules, without the addition of exogenous morphogens, unlike their mouse counterparts. Subsequent quantitative RT-PCR confirmed up-regulation of expression of chondrogenic regulatory genes in lizard cells, including bmp2, sox9, runx2, and cartilage specific structural genes, aggrecan and collagen 2a1. Taken together, these data suggest that tail regeneration in lizards involves significant alterations in gene regulation with expanded musculoskeletal potency. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse