1
|
Adão T, Oliveira J, Shahrabadi S, Jesus H, Fernandes M, Costa Â, Ferreira V, Gonçalves MF, Lopéz MAG, Peres E, Magalhães LG. Empowering Deaf-Hearing Communication: Exploring Synergies between Predictive and Generative AI-Based Strategies towards ( Portuguese) Sign Language Interpretation. J Imaging 2023; 9:235. [PMID: 37998082 PMCID: PMC10672430 DOI: 10.3390/jimaging9110235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 10/14/2023] [Accepted: 10/19/2023] [Indexed: 11/25/2023] Open
Abstract
Communication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Língua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM-specifically ChatGPT-demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences.
Collapse
Affiliation(s)
- Telmo Adão
- Department of Engineering, School of Sciences and Technology, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal;
- ALGORITMI Research Centre/LASI, University of Minho, 4800-058 Guimarães, Portugal; (M.A.G.L.); (L.G.M.)
| | - João Oliveira
- Centro de Computação Gráfica-CCG/zgdv, University of Minho, Campus de Azurém, Edifício 14, 4800-058 Guimarães, Portugal; (J.O.); (S.S.); (H.J.)
| | - Somayeh Shahrabadi
- Centro de Computação Gráfica-CCG/zgdv, University of Minho, Campus de Azurém, Edifício 14, 4800-058 Guimarães, Portugal; (J.O.); (S.S.); (H.J.)
| | - Hugo Jesus
- Centro de Computação Gráfica-CCG/zgdv, University of Minho, Campus de Azurém, Edifício 14, 4800-058 Guimarães, Portugal; (J.O.); (S.S.); (H.J.)
| | - Marco Fernandes
- Polytechnic Institute of Bragança, School of Communication, Administration and Tourism, Campus do Cruzeiro, 5370-202 Mirandela, Portugal; (M.F.); (M.F.G.)
| | - Ângelo Costa
- Associação Portuguesa de Surdos (APS), 1600-796 Lisboa, Portugal; (Â.C.); (V.F.)
| | - Vânia Ferreira
- Associação Portuguesa de Surdos (APS), 1600-796 Lisboa, Portugal; (Â.C.); (V.F.)
| | - Martinho Fradeira Gonçalves
- Polytechnic Institute of Bragança, School of Communication, Administration and Tourism, Campus do Cruzeiro, 5370-202 Mirandela, Portugal; (M.F.); (M.F.G.)
| | - Miguel A. Guevara Lopéz
- ALGORITMI Research Centre/LASI, University of Minho, 4800-058 Guimarães, Portugal; (M.A.G.L.); (L.G.M.)
- Instituto Politécnico de Setúbal, Escola Superior de Tecnologia de Setúbal, 2914-508 Setúbal, Portugal
| | - Emanuel Peres
- Department of Engineering, School of Sciences and Technology, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal;
- Centre for the Research and Technology of Agro-Environmental and Biological Sciences, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal
- Institute for Innovation, Capacity Building and Sustainability of Agri-Food Production, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal
| | - Luís Gonzaga Magalhães
- ALGORITMI Research Centre/LASI, University of Minho, 4800-058 Guimarães, Portugal; (M.A.G.L.); (L.G.M.)
| |
Collapse
|
2
|
Podder KK, Ezeddin M, Chowdhury MEH, Sumon MSI, Tahir AM, Ayari MA, Dutta P, Khandakar A, Mahbub ZB, Kadir MA. Signer-Independent Arabic Sign Language Recognition System Using Deep Learning Model. SENSORS (BASEL, SWITZERLAND) 2023; 23:7156. [PMID: 37631693 PMCID: PMC10459624 DOI: 10.3390/s23167156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 07/27/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023]
Abstract
Every one of us has a unique manner of communicating to explore the world, and such communication helps to interpret life. Sign language is the popular language of communication for hearing and speech-disabled people. When a sign language user interacts with a non-sign language user, it becomes difficult for a signer to express themselves to another person. A sign language recognition system can help a signer to interpret the sign of a non-sign language user. This study presents a sign language recognition system that is capable of recognizing Arabic Sign Language from recorded RGB videos. To achieve this, two datasets were considered, such as (1) the raw dataset and (2) the face-hand region-based segmented dataset produced from the raw dataset. Moreover, operational layer-based multi-layer perceptron "SelfMLP" is proposed in this study to build CNN-LSTM-SelfMLP models for Arabic Sign Language recognition. MobileNetV2 and ResNet18-based CNN backbones and three SelfMLPs were used to construct six different models of CNN-LSTM-SelfMLP architecture for performance comparison of Arabic Sign Language recognition. This study examined the signer-independent mode to deal with real-time application circumstances. As a result, MobileNetV2-LSTM-SelfMLP on the segmented dataset achieved the best accuracy of 87.69% with 88.57% precision, 87.69% recall, 87.72% F1 score, and 99.75% specificity. Overall, face-hand region-based segmentation and SelfMLP-infused MobileNetV2-LSTM-SelfMLP surpassed the previous findings on Arabic Sign Language recognition by 10.970% accuracy.
Collapse
Affiliation(s)
- Kanchon Kanti Podder
- Department of Biomedical Physics & Technology, University of Dhaka, Dhaka 1000, Bangladesh
| | - Maymouna Ezeddin
- Department of Computer Science, Hamad Bin Khalifa University, Doha 34110, Qatar
| | | | - Md. Shaheenur Islam Sumon
- Department of Biomedical Engineering, Military Institute of Science and Technology (MIST), Dhaka 1216, Bangladesh
| | - Anas M. Tahir
- Department of Electrical Engineering, Qatar University, Doha 2713, Qatar
| | | | - Proma Dutta
- Department of Electrical& Electronic Engineering, Chittagong University of Engineering & Technology, Chittagong 4349, Bangladesh
| | - Amith Khandakar
- Department of Electrical Engineering, Qatar University, Doha 2713, Qatar
| | - Zaid Bin Mahbub
- Department of Mathematics and Physics, North South University, Dhaka 1229, Bangladesh
| | - Muhammad Abdul Kadir
- Department of Biomedical Physics & Technology, University of Dhaka, Dhaka 1000, Bangladesh
| |
Collapse
|
3
|
Xia K, Lu W, Fan H, Zhao Q. A Sign Language Recognition System Applied to Deaf-Mute Medical Consultation. SENSORS (BASEL, SWITZERLAND) 2022; 22:9107. [PMID: 36501809 PMCID: PMC9739223 DOI: 10.3390/s22239107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 11/10/2022] [Accepted: 11/20/2022] [Indexed: 06/17/2023]
Abstract
It is an objective reality that deaf-mute people have difficulty seeking medical treatment. Due to the lack of sign language interpreters, most hospitals in China currently do not have the ability to interpret sign language. Normal medical treatment is a luxury for deaf people. In this paper, we propose a sign language recognition system: Heart-Speaker. Heart-Speaker is applied to a deaf-mute consultation scenario. The system provides a low-cost solution for the difficult problem of treating deaf-mute patients. The doctor only needs to point the Heart-Speaker at the deaf patient and the system automatically captures the sign language movements and translates the sign language semantics. When a doctor issues a diagnosis or asks a patient a question, the system displays the corresponding sign language video and subtitles to meet the needs of two-way communication between doctors and patients. The system uses the MobileNet-YOLOv3 model to recognize sign language. It meets the needs of running on embedded terminals and provides favorable recognition accuracy. We performed experiments to verify the accuracy of the measurements. The experimental results show that the accuracy rate of Heart-Speaker in recognizing sign language can reach 90.77%.
Collapse
Affiliation(s)
| | - Weiwei Lu
- Correspondence: ; Tel.: +86-13671637275
| | | | | |
Collapse
|
4
|
Backhand-Approach-Based American Sign Language Words Recognition Using Spatial-Temporal Body Parts and Hand Relationship Patterns. SENSORS 2022; 22:s22124554. [PMID: 35746330 PMCID: PMC9228298 DOI: 10.3390/s22124554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 11/17/2022]
Abstract
Most of the existing methods focus mainly on the extraction of shape-based, rotation-based, and motion-based features, usually neglecting the relationship between hands and body parts, which can provide significant information to address the problem of similar sign words based on the backhand approach. Therefore, this paper proposes four feature-based models. The spatial–temporal body parts and hand relationship patterns are the main feature. The second model consists of the spatial–temporal finger joint angle patterns. The third model consists of the spatial–temporal 3D hand motion trajectory patterns. The fourth model consists of the spatial–temporal double-hand relationship patterns. Then, a two-layer bidirectional long short-term memory method is used to deal with time-independent data as a classifier. The performance of the method was evaluated and compared with the existing works using 26 ASL letters, with an accuracy and F1-score of 97.34% and 97.36%, respectively. The method was further evaluated using 40 double-hand ASL words and achieved an accuracy and F1-score of 98.52% and 98.54%, respectively. The results demonstrated that the proposed method outperformed the existing works under consideration. However, in the analysis of 72 new ASL words, including single- and double-hand words from 10 participants, the accuracy and F1-score were approximately 96.99% and 97.00%, respectively.
Collapse
|
5
|
BenSignNet: Bengali Sign Language Alphabet Recognition Using Concatenated Segmentation and Convolutional Neural Network. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12083933] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Sign language recognition is one of the most challenging applications in machine learning and human-computer interaction. Many researchers have developed classification models for different sign languages such as English, Arabic, Japanese, and Bengali; however, no significant research has been done on the general-shape performance for different datasets. Most research work has achieved satisfactory performance with a small dataset. These models may fail to replicate the same performance for evaluating different and larger datasets. In this context, this paper proposes a novel method for recognizing Bengali sign language (BSL) alphabets to overcome the issue of generalization. The proposed method has been evaluated with three benchmark datasets such as ‘38 BdSL’, ‘KU-BdSL’, and ‘Ishara-Lipi’. Here, three steps are followed to achieve the goal: segmentation, augmentation, and Convolutional neural network (CNN) based classification. Firstly, a concatenated segmentation approach with YCbCr, HSV and watershed algorithm was designed to accurately identify gesture signs. Secondly, seven image augmentation techniques are selected to increase the training data size without changing the semantic meaning. Finally, the CNN-based model called BenSignNet was applied to extract the features and classify purposes. The performance accuracy of the model achieved 94.00%, 99.60%, and 99.60% for the BdSL Alphabet, KU-BdSL, and Ishara-Lipi datasets, respectively. Experimental findings confirmed that our proposed method achieved a higher recognition rate than the conventional ones and accomplished a generalization property in all datasets for the BSL domain.
Collapse
|
6
|
Islam MM, Uddin MR, AKhtar MN, Alam KR. Recognizing multiclass Static Sign Language words for deaf and dumb people of Bangladesh based on transfer learning techniques. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.101077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022] Open
|