1
|
Abeykoon V, Kamburugamuve S, Widanage C, Perera N, Uyar A, Kanewala TA, von Laszewski G, Fox G. HPTMT Parallel Operators for High Performance Data Science and Data Engineering. Front Big Data 2022; 4:756041. [PMID: 35198971 PMCID: PMC8860100 DOI: 10.3389/fdata.2021.756041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 11/29/2021] [Indexed: 11/13/2022] Open
Abstract
Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definition of data structures and operators in the field has led to other implementations that do not work well together. The HPTMT architecture that we proposed recently, identifies a set of data structures, operators, and an execution model for creating rich data applications that links all aspects of data engineering and data science together efficiently. This paper elaborates and illustrates this architecture using an end-to-end application with deep learning and data engineering parts working together. Our analysis show that the proposed system architecture is better suited for high performance computing environments compared to the current big data processing systems. Furthermore our proposed system emphasizes the importance of efficient compact data structures such as Apache Arrow tabular data representation defined for high performance. Thus the system integration we proposed scales a sequential computation to a distributed computation retaining optimum performance along with highly usable application programming interface.
Collapse
Affiliation(s)
- Vibhatha Abeykoon
- Indiana University Alumni, Bloomington, IN, United States
- *Correspondence: Vibhatha Abeykoon,
| | - Supun Kamburugamuve
- Luddy School of Informatics, Computing and Engineering, Bloomington, IN, United States
| | - Chathura Widanage
- Luddy School of Informatics, Computing and Engineering, Bloomington, IN, United States
| | - Niranda Perera
- Luddy School of Informatics, Computing and Engineering, Bloomington, IN, United States
| | - Ahmet Uyar
- Luddy School of Informatics, Computing and Engineering, Bloomington, IN, United States
| | | | - Gregor von Laszewski
- Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, VA, United States
| | - Geoffrey Fox
- Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, VA, United States
- Computer Science Department, University of Virginia, Charlottesville, VA, United States
| |
Collapse
|
2
|
|