1
|
Ko G, Kim PG, Yoon BH, Kim J, Song W, Byeon I, Yoon J, Lee B, Kim YK. Closha 2.0: a bio-workflow design system for massive genome data analysis on high performance cluster infrastructure. BMC Bioinformatics 2024; 25:353. [PMID: 39533201 PMCID: PMC11558834 DOI: 10.1186/s12859-024-05963-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and significant computational challenges. As the cost of next-generation sequencing (NGS) has decreased, the amount of genomic data has surged globally. However, the cost and complexity of the computational resources required continue to be substantial barriers to leveraging big data. A promising solution to these computational challenges is cloud computing, which provides researchers with the necessary CPUs, memory, storage, and software tools. RESULTS Here, we present Closha 2.0, a cloud computing service that offers a user-friendly platform for analyzing massive genomic datasets. Closha 2.0 is designed to provide a cloud-based environment that enables all genomic researchers, including those with limited or no programming experience, to easily analyze their genomic data. The new 2.0 version of Closha has more user-friendly features than the previous 1.0 version. Firstly, the workbench features a script editor that supports Python, R, and shell script programming, enabling users to write scripts and integrate them into their pipelines. This functionality is particularly useful for downstream analysis. Second, Closha 2.0 runs on containers, which execute each tool in an independent environment. This provides a stable environment and prevents dependency issues and version conflicts among tools. Additionally, users can execute each step of a pipeline individually, allowing them to test applications at each stage and adjust parameters to achieve the desired results. We also updated a high-speed data transmission tool called GBox that facilitates the rapid transfer of large datasets. CONCLUSIONS The analysis pipelines on Closha 2.0 are reproducible, with all analysis parameters and inputs being permanently recorded. Closha 2.0 simplifies multi-step analysis with drag-and-drop functionality and provides a user-friendly interface for genomic scientists to obtain accurate results from NGS data. Closha 2.0 is freely available at https://www.kobic.re.kr/closha2 .
Collapse
Affiliation(s)
- Gunhwan Ko
- Korean Bioinformation Center (KOBIC), KRIBB, 125 Gwahangno, Yuseong-gu, Daejeon, 34141, Korea
| | - Pan-Gyu Kim
- Korean Bioinformation Center (KOBIC), KRIBB, 125 Gwahangno, Yuseong-gu, Daejeon, 34141, Korea
| | - Byung-Ha Yoon
- Korean Bioinformation Center (KOBIC), KRIBB, 125 Gwahangno, Yuseong-gu, Daejeon, 34141, Korea
| | - JaeHee Kim
- Korean Bioinformation Center (KOBIC), KRIBB, 125 Gwahangno, Yuseong-gu, Daejeon, 34141, Korea
| | - Wangho Song
- Korean Bioinformation Center (KOBIC), KRIBB, 125 Gwahangno, Yuseong-gu, Daejeon, 34141, Korea
| | - IkSu Byeon
- Korean Bioinformation Center (KOBIC), KRIBB, 125 Gwahangno, Yuseong-gu, Daejeon, 34141, Korea
| | - JongCheol Yoon
- Korean Bioinformation Center (KOBIC), KRIBB, 125 Gwahangno, Yuseong-gu, Daejeon, 34141, Korea
| | - Byungwook Lee
- Korean Bioinformation Center (KOBIC), KRIBB, 125 Gwahangno, Yuseong-gu, Daejeon, 34141, Korea.
| | - Young-Kuk Kim
- Department of Bio-AI Convergence, Chungnam National University, Daejeon, 34134, Korea.
| |
Collapse
|
2
|
Ko G, Lee JH, Sim YM, Song W, Yoon BH, Byeon I, Lee BH, Kim SO, Choi J, Jang I, Kim H, Yang JO, Jang K, Kim S, Kim JH, Jeon J, Jung J, Hwang S, Park JH, Kim PG, Kim SY, Lee B. KoNA: Korean Nucleotide Archive as A New Data Repository for Nucleotide Sequence Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae017. [PMID: 38862433 PMCID: PMC12016568 DOI: 10.1093/gpbjnl/qzae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 11/20/2023] [Accepted: 01/08/2024] [Indexed: 06/13/2024]
Abstract
During the last decade, the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges, including access to human data, as well as transfer, storage, and sharing of enormous amounts of data. To promote data-driven biological research, the Korean government announced that all biological data generated from government-funded research projects should be deposited at the Korea BioData Station (K-BDS), which consists of multiple databases for individual data types. Here, we introduce the Korean Nucleotide Archive (KoNA), a repository of nucleotide sequence data. As of July 2022, the Korean Read Archive in KoNA has collected over 477 TB of raw next-generation sequencing data from national genome projects. To ensure data quality and prepare for international alignment, a standard operating procedure was adopted, which is similar to that of the International Nucleotide Sequence Database Collaboration. The standard operating procedure includes quality control processes for submitted data and metadata using an automated pipeline, followed by manual examination. To ensure fast and stable data transfer, a high-speed transmission system called GBox is used in KoNA. Furthermore, the data uploaded to or downloaded from KoNA through GBox can be readily processed using a cloud computing service called Bio-Express. This seamless coupling of KoNA, GBox, and Bio-Express enhances the data experience, including submission, access, and analysis of raw nucleotide sequences. KoNA not only satisfies the unmet needs for a national sequence repository in Korea but also provides datasets to researchers globally and contributes to advances in genomics. The KoNA is available at https://www.kobic.re.kr/kona/.
Collapse
Affiliation(s)
- Gunhwan Ko
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Jae Ho Lee
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Young Mi Sim
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Wangho Song
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Byung-Ha Yoon
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Iksu Byeon
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Bang Hyuck Lee
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Sang-Ok Kim
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Jinhyuk Choi
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Insoo Jang
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Hyerin Kim
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Jin Ok Yang
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Kiwon Jang
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Sora Kim
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Jong-Hwan Kim
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Jongbum Jeon
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Jaeeun Jung
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Seungwoo Hwang
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Ji-Hwan Park
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Pan-Gyu Kim
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Seon-Young Kim
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| | - Byungwook Lee
- Korea Bioinformation Center, Korea Research Institute of Bioscience & Biotechnology, Daejeon 34141, Republic of Korea
| |
Collapse
|
3
|
The Innovative Informatics Approaches of High-Throughput Technologies in Livestock: Spearheading the Sustainability and Resiliency of Agrigenomics Research. LIFE (BASEL, SWITZERLAND) 2022; 12:life12111893. [PMID: 36431028 PMCID: PMC9695872 DOI: 10.3390/life12111893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 11/09/2022] [Accepted: 11/14/2022] [Indexed: 11/17/2022]
Abstract
For more than a decade, next-generation sequencing (NGS) has been emerging as the mainstay of agrigenomics research. High-throughput technologies have made it feasible to facilitate research at the scale and cost required for using this data in livestock research. Scale frameworks of sequencing for agricultural and livestock improvement, management, and conservation are partly attributable to innovative informatics methodologies and advancements in sequencing practices. Genome-wide sequence-based investigations are often conducted worldwide, and several databases have been created to discover the connections between worldwide scientific accomplishments. Such studies are beginning to provide revolutionary insights into a new era of genomic prediction and selection capabilities of various domesticated livestock species. In this concise review, we provide selected examples of the current state of sequencing methods, many of which are already being used in animal genomic studies, and summarize the state of the positive attributes of genome-based research for cattle (Bos taurus), sheep (Ovis aries), pigs (Sus scrofa domesticus), horses (Equus caballus), chickens (Gallus gallus domesticus), and ducks (Anas platyrhyncos). This review also emphasizes the advantageous features of sequencing technologies in monitoring and detecting infectious zoonotic diseases. In the coming years, the continued advancement of sequencing technologies in livestock agrigenomics will significantly influence the sustained momentum toward regulatory approaches that encourage innovation to ensure continued access to a safe, abundant, and affordable food supplies for future generations.
Collapse
|
4
|
Gulfidan G, Beklen H, Arga KY. Artificial Intelligence as Accelerator for Genomic Medicine and Planetary Health. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2021; 25:745-749. [PMID: 34780300 DOI: 10.1089/omi.2021.0170] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Genomic medicine has made important strides over the past several decades, but as new insights and technologies emerge, the applications of genomics in medicine and planetary health continue to evolve and expand. An important grand challenge is harnessing and making sense of the genomic big data in ways that best serve public and planetary health. Because human health is inextricably intertwined with the health of planetary ecosystems and nonhuman animals, genomic medicine is in need of high throughput bioinformatics analyses to harness and integrate human and ecological multiomics big data. It is in this overarching context that artificial intelligence (AI), particularly machine learning and deep learning, offers enormous potentials to advance genomic medicine in a spirit of One Health. This expert review offers an analysis of the rapidly emerging role of AI in genomic medicine, including its current drivers, levers, opportunities, and challenges. The scope of AI applications in genomic medicine is broad, ranging from efficient and automated data analysis to drug repurposing and precision medicine, as with its challenges such as veracity of the big data that AI sorely depends on, social biases that the AI-driven algorithms can introduce, and how best to incorporate AI with human intelligence. The road ahead for AI in genomic medicine is complex and arduous and yet worthy of cautious optimism as we face future pandemics and ecological crises in the 21st century. Now is a good time to think about the role of AI in genomic medicine and planetary health.
Collapse
Affiliation(s)
- Gizem Gulfidan
- Department of Bioengineering, Marmara University, Istanbul, Turkey
| | - Hande Beklen
- Department of Bioengineering, Marmara University, Istanbul, Turkey
| | | |
Collapse
|