1
|
Kumar H, Panigrahi M, Seo D, Cho S, Bhushan B, Dutt T. Machine Learning-Aided Ultra-Low-Density Single Nucleotide Polymorphism Panel Helps to Identify the Tharparkar Cattle Breed: Lessons for Digital Transformation in Livestock Genomics. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2024. [PMID: 39302202 DOI: 10.1089/omi.2024.0153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Cattle breed identification is crucial for livestock research and sustainable food systems, and advances in genomics and artificial intelligence present new opportunities to address these challenges. This study investigates the identification of the Tharparkar cattle breed using genomics tools combined with machine learning (ML) techniques. By leveraging data from the Bovine SNP 50K chip, we developed a breed-specific panel of single nucleotide polymorphisms (SNPs) for Tharparkar cattle and integrated data from seven other Indian cattle populations to enhance panel robustness. Genome-wide association studies (GWAS) and principal component analysis were employed to identify 500 SNPs, which were then refined using ML models-AdaBoost, bagging tree, gradient boosting machines, and random forest-to determine the minimal number of SNPs needed for accurate breed identification. Panels of 23 and 48 SNPs achieved accuracy rates of 95.2-98.4%. Importantly, the identified SNPs were associated with key productive and adaptive traits, thus attesting to the value and potentials of digital transformation in livestock genomics. The ML-aided ultra-low-density SNP panel approach reported here not only facilitates breed identification but also contributes to preserving genetic diversity and guiding future breeding programs.
Collapse
Affiliation(s)
- Harshit Kumar
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, India
- ICAR-National Research Centre on Mithun, Medziphema, India
| | - Manjit Panigrahi
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, India
| | - Dongwon Seo
- Research and Development Center, TNT research Co., Jeonju-si, South Korea
| | - Sunghyun Cho
- Research and Development Center, Insilicogen Inc., Yongin-si, South Korea
| | - Bharat Bhushan
- Division of Animal Genetics, Indian Veterinary Research Institute, Izatnagar, India
| | - Triveni Dutt
- Animal Genetics & Breeding Section, Indian Veterinary Research Institute, Izatnagar, India
| |
Collapse
|
2
|
Schiavo G, Bertolini F, Bovo S, Galimberti G, Muñoz M, Bozzi R, Čandek-Potokar M, Óvilo C, Fontanesi L. Identification of population-informative markers from high-density genotyping data through combined feature selection and machine learning algorithms: Application to European autochthonous and cosmopolitan pig breeds. Anim Genet 2024; 55:193-205. [PMID: 38191264 DOI: 10.1111/age.13396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 11/09/2023] [Accepted: 12/27/2023] [Indexed: 01/10/2024]
Abstract
Large genotyping datasets, obtained from high-density single nucleotide polymorphism (SNP) arrays, developed for different livestock species, can be used to describe and differentiate breeds or populations. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this study, we applied the Boruta algorithm, a wrapper of the machine learning random forest algorithm, on a database of 23 European pig breeds (20 autochthonous and three cosmopolitan breeds) genotyped with a 70k SNP chip, to pre-select informative SNPs. To identify different sets of SNPs, these pre-selected markers were then ranked with random forest based on their mean decrease accuracy and mean decrease gene indexes. We evaluated the efficiency of these subsets for breed classification and the usefulness of this approach to detect candidate genes affecting breed-specific phenotypes and relevant production traits that might differ among breeds. The lowest overall classification error (2.3%) was reached with a subpanel including only 398 SNPs (ranked based on their mean decrease accuracy), with no classification error in seven breeds using up to 49 SNPs. Several SNPs of these selected subpanels were in genomic regions in which previous studies had identified signatures of selection or genes associated with morphological or production traits that distinguish the analysed breeds. Therefore, even if these approaches have not been originally designed to identify signatures of selection, the obtained results showed that they could potentially be useful for this purpose.
Collapse
Affiliation(s)
- Giuseppina Schiavo
- Animal and Food Genomics Group, Division of Animal Sciences, Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Francesca Bertolini
- Animal and Food Genomics Group, Division of Animal Sciences, Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Samuele Bovo
- Animal and Food Genomics Group, Division of Animal Sciences, Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Giuliano Galimberti
- Department of Statistical Sciences 'Paolo Fortunati', University of Bologna, Bologna, Italy
| | - María Muñoz
- Departamento Mejora Genética Animal, INIA-CSIC, Madrid, Spain
| | - Riccardo Bozzi
- Animal Science Division, Dipartimento di Scienze e Tecnologie Agrarie, Alimentari, Ambientali e Forestali, Università di Firenze, Firenze, Italy
| | | | - Cristina Óvilo
- Departamento Mejora Genética Animal, INIA-CSIC, Madrid, Spain
| | - Luca Fontanesi
- Animal and Food Genomics Group, Division of Animal Sciences, Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| |
Collapse
|