Chen Y, Wang A, Liu Z, Yue J, Zhang E, Li F, Zhang N. MoSViT: a lightweight vision transformer framework for efficient disease detection via precision attention mechanism.
Front Artif Intell 2025;
8:1498025. [PMID:
40206703 PMCID:
PMC11979205 DOI:
10.3389/frai.2025.1498025]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Accepted: 02/27/2025] [Indexed: 04/11/2025] Open
Abstract
Maize, a globally essential staple crop, suffers significant yield losses due to diseases. Traditional diagnostic methods are often inefficient and subjective, posing challenges for timely and accurate pest management. This study introduces MoSViT, an innovative classification model leveraging advanced machine learning and computer vision technologies. Built on the MobileViT V2 framework, MoSViT integrates the CLA focus mechanism, DRB module, MoSViT Block, and the LeakyRelu6 activation function to enhance feature extraction accuracy while reducing computational complexity. Trained on a dataset of 3,850 images encompassing Blight, Common Rust, Gray Leaf Spot, and Healthy conditions, MoSViT achieves exceptional performance, with classification accuracy, Precision, Recall, and F1 Score of 98.75%, 98.73%, 98.72%, and 98.72%, respectively. These results surpass leading models such as Swin Transformer V2, DenseNet121, and EfficientNet V2 in both accuracy and parameter efficiency. Additionally, the model's interpretability is enhanced through heatmap analysis, providing insights into its decision-making process. Testing on small sample datasets further demonstrates MoSViT's generalization capability and potential for small-sample detection scenarios.
Collapse