1
|
Fatima T, Islam RU, Anwar MW, Jamal MH, Chaudhry MT, Gillani Z. STEMUR: An Automated Word Conflation Algorithm for the Urdu Language. ACM T ASIAN LOW-RESO 2022. [DOI: 10.1145/3476226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Stemming is a common word conflation method that perceives stems embedded in the words and decreases them to their stem (root) by conflating all the morphologically related terms into a single term, without doing a complete morphological analysis. This article presents STEMUR, an enhanced stemming algorithm for automatic word conflation for Urdu language. In addition to handling words with prefixes and suffixes, STEMUR also handles words with infixes. Rather than using a totally unsupervised approach, we utilized the linguistic knowledge to develop a collection of patterns for Urdu infixes to enhance the accuracy of the stems and affixes acquired during the training process. Additionally, STEMUR also handles English loan words and can handle words with more than one affix. STEMUR is compared with four existing Urdu stemmers including Assas-Band and the template-based stemmer that are also implemented in this study. Results are processed on two corpora containing 89,437 and 30,907 words separately. Results show clear improvements regarding strength and accuracy of STEMUR. The use of maximum possible infix rules boosted our stemmer's accuracy up to 93.1% and helped us achieve a precision of 98.9%.
Collapse
|
2
|
Text categorization based on a new classification by thresholds. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/s13748-021-00247-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
3
|
Khalifi H, Dahir S, El Qadi A, Ghanou Y. Enhancing information retrieval performance by using social analysis. SOCIAL NETWORK ANALYSIS AND MINING 2020. [DOI: 10.1007/s13278-020-00635-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
4
|
Bahassine S, Madani A, Al-Sarem M, Kissi M. Feature selection using an improved Chi-square for Arabic text classification. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES 2020. [DOI: 10.1016/j.jksuci.2018.05.010] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
5
|
Query expansion based on clustering and personalized information retrieval. PROGRESS IN ARTIFICIAL INTELLIGENCE 2019. [DOI: 10.1007/s13748-019-00178-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|