1
|
Dong H, Wang W, Sun Z, Kang Z, Ge X, Gao F, Wang J. Knowledge graph construction for intelligent cockpits based on large language models. Sci Rep 2025; 15:7635. [PMID: 40038399 DOI: 10.1038/s41598-025-92002-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2024] [Accepted: 02/25/2025] [Indexed: 03/06/2025] Open
Abstract
As intelligent cockpits rapidly evolve towards "proactive natural interaction," traditional rule-based user behavior inference methods are facing scalability, generalization, and accuracy bottlenecks, leading to the development and deployment of functions oriented towards pseudo-demands. Effectively capturing and representing the hidden associative knowledge in intelligent cockpits can enhance the system's understanding of user behavior and environmental contexts, thereby precisely discerning real user needs. In this context, knowledge graphs (KGs) have emerged as an effective tool, enabling the retrieval and organization of vast amounts of information within interconnected and interpretable structures. However, rapidly and flexibly generating domain-specific KGs still poses significant challenges. To address this, this paper introduces a novel knowledge graph construction (KGC) model, GLM-TripleGen, dedicated to analyzing the states and behaviors within intelligent cockpits. This model aims to precisely mine the latent relationships between cockpit state factors and behavioral sequences, effectively addressing key challenges such as the ambiguity in entity recognition and the complexity of relationship extraction within cockpit data. To enhance the adaptability of GLM-TripleGen to the intelligent cockpit domain, this paper constructs an instruction-following dataset based on vehicle states and in-cockpit interaction behaviors, containing a large number of prompt texts paired with corresponding triple labels, to support model fine-tuning. During the fine-tuning process, the Low-Rank Adaptation (LoRA) method is employed to effectively optimize model parameters, significantly reducing training costs. Extensive experiments demonstrate that GLM-TripleGen outperforms existing state-of-the-art KGC methods, accurately generating normalized cockpit triple units. Furthermore, GLM-TripleGen exhibits exceptional robustness and generalization ability, handling various unknown entities and relationships with minimal generalization processing.
Collapse
Affiliation(s)
- Haomin Dong
- School of Mechanical and Aerospace Engineering, Jilin University, Changchun, 130025, China
- Research Institute, China FAW Group Co., Ltd., Changchun, 130000, China
| | - Wenbin Wang
- Research Institute, China FAW Group Co., Ltd., Changchun, 130000, China
| | - Zhenjiang Sun
- Research Institute, China FAW Group Co., Ltd., Changchun, 130000, China
| | - Ziyi Kang
- Research Institute, China FAW Group Co., Ltd., Changchun, 130000, China
| | - Xiaojun Ge
- College of Automotive Engineering, Jilin University, Changchun, 130025, China
| | - Fei Gao
- College of Automotive Engineering, Jilin University, Changchun, 130025, China.
- National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun, 130025, China.
| | - Jixin Wang
- School of Mechanical and Aerospace Engineering, Jilin University, Changchun, 130025, China
- Key Laboratory of CNC Equipment Reliability, Ministry of Education, Jilin University, Changchun, 130022, China
| |
Collapse
|
2
|
O'Ryan C, Hayes KD, VanGessel FG, Doherty RM, Wilson W, Fischer J, Boukouvalas Z, Chung PW. An Automated Approach for Domain-Specific Knowledge Graph Generation─Graph Measures and Characterization. J Chem Inf Model 2025; 65:1243-1257. [PMID: 39874149 DOI: 10.1021/acs.jcim.4c01904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2025]
Abstract
In 2020, nearly 3 million scientific and engineering papers were published worldwide (White, K. Publications Output: U.S. Trends And International Comparisons). The vastness of the literature that already exists, the increasing rate of appearance of new publications, and the timely translation of artificial intelligence methods into scientific and engineering communities have ushered in the development of automated methods for mining and extracting information from technical documents. However, domain-specific approaches for extracting knowledge graph representations from semantic information remain limited. In this paper, we develop a natural language processing (NLP) approach to extract knowledge graphs resulting in a semantically structured network (SSN) that can be queried. After a detailed exposition of the modeling method, the approach is demonstrated specifically for the synthetic chemistry of organic molecules from the text of approximately 100,000 full-length patents. In this paper, we focus specifically on characterizing the knowledge graph to develop insights into the linguistic patterns and trends within the data and to establish objective graph characteristics that may enable comparisons among other text-based knowledge graphs across domains. Graph characterization is performed for network motif structures, assortativity, and eigenvector centrality. The structural information provided by the measures reveals language tendencies commonly employed by authors in the text discourse for chemical reactions. These include observations of the prevalence of descriptions of specific compound names, that common solvents and drying agents cut across large numbers of chemical synthesis approaches, and that power-law trends clearly emerge in the limit of larger corpora. The findings provide important quantitative characterizations of knowledge graphs for use in validation in large data settings.
Collapse
Affiliation(s)
- Connor O'Ryan
- Center for Engineering Concepts Development, Department of Mechanical Engineering, University of Maryland, College Park, Maryland 20742, United States
| | - Kevin D Hayes
- Center for Engineering Concepts Development, Department of Mechanical Engineering, University of Maryland, College Park, Maryland 20742, United States
| | - Francis G VanGessel
- U.S. Naval Surface Warfare Center, Indian Head Division, Indian Head, Maryland 20640, United States
| | - Ruth M Doherty
- Energetics Technology Center, Indian Head, Maryland 20640, United States
| | - William Wilson
- Energetics Technology Center, Indian Head, Maryland 20640, United States
| | - John Fischer
- Energetics Technology Center, Indian Head, Maryland 20640, United States
| | - Zois Boukouvalas
- American University, Washington, District of Columbia 20016, United States
| | - Peter W Chung
- Center for Engineering Concepts Development, Department of Mechanical Engineering, University of Maryland, College Park, Maryland 20742, United States
| |
Collapse
|
3
|
Chen Y, Chen G, Li P. Research on a Joint Extraction Method of Track Circuit Entities and Relations Integrating Global Pointer and Tensor Learning. SENSORS (BASEL, SWITZERLAND) 2024; 24:7128. [PMID: 39598905 PMCID: PMC11598563 DOI: 10.3390/s24227128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 11/01/2024] [Accepted: 11/04/2024] [Indexed: 11/29/2024]
Abstract
To address the issue of efficiently reusing the massive amount of unstructured knowledge generated during the handling of track circuit equipment faults and to automate the construction of knowledge graphs in the railway maintenance domain, it is crucial to leverage knowledge extraction techniques to efficiently extract relational triplets from fault maintenance text data. Given the current lag in joint extraction technology within the railway domain and the inefficiency in resource utilization, this paper proposes a joint extraction model for track circuit entities and relations, integrating Global Pointer and tensor learning. Taking into account the associative characteristics of semantic relations, the nesting of domain-specific terms in the railway sector, and semantic diversity, this research views the relation extraction task as a tensor learning process and the entity recognition task as a span-based Global Pointer search process. First, a multi-layer dilate gated convolutional neural network with residual connections is used to extract key features and fuse the weighted information from the 12 different semantic layers of the RoBERTa-wwm-ext model, fully exploiting the performance of each encoding layer. Next, the Tucker decomposition method is utilized to capture the semantic correlations between relations, and an Efficient Global Pointer is employed to globally predict the start and end positions of subject and object entities, incorporating relative position information through rotary position embedding (RoPE). Finally, comparative experiments with existing mainstream joint extraction models were conducted, and the proposed model's excellent performance was validated on the English public datasets NYT and WebNLG, the Chinese public dataset DuIE, and a private track circuit dataset. The F1 scores on the NYT, WebNLG, and DuIE public datasets reached 92.1%, 92.7%, and 78.2%, respectively.
Collapse
Affiliation(s)
- Yanrui Chen
- School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China; (Y.C.); (P.L.)
- Key Laboratory of Plateau Traffic Information Engineering and Control of Gansu Province, Lanzhou Jiaotong University, Lanzhou 730070, China
| | - Guangwu Chen
- School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China; (Y.C.); (P.L.)
- Key Laboratory of Plateau Traffic Information Engineering and Control of Gansu Province, Lanzhou Jiaotong University, Lanzhou 730070, China
| | - Peng Li
- School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China; (Y.C.); (P.L.)
- Key Laboratory of Plateau Traffic Information Engineering and Control of Gansu Province, Lanzhou Jiaotong University, Lanzhou 730070, China
| |
Collapse
|
4
|
Zeng W, Zhao X, Tang J, Fan C. Knowledge Graph Alignment Under Scarce Supervision: A General Framework With Active Cross-View Contrastive Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11692-11705. [PMID: 37847632 DOI: 10.1109/tnnls.2023.3321900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2023]
Abstract
Over recent years, a number of knowledge graphs (KGs) have emerged. Nevertheless, a KG can never reach full completeness. A viable approach to increase the coverage of a KG is KG alignment (KGA). The majority of previous efforts merely focus on the matching between entities, while largely neglect relations. Besides, they heavily rely on labeled data, which are difficult to obtain in practice. To address these issues, in this work, we put forward a general framework to simultaneously align entities and relations under scarce supervision. Our proposal consists of two main components, relation-enhanced active instance selection (RAS), and cross-view contrastive learning (CCL). RAS aims to select the most valuable instances to be labeled with the guidance of relations, while CCL contrasts cross-view representations to augment scarce supervision signals. Our proposal is agnostic to the underlying entity and relation alignment models, and can be used to improve their performance under limited supervision. We conduct experiments on a wide range of popular KG pairs, and the results demonstrate that our proposed model and its components can consistently boost the alignment performance under scarce supervision.
Collapse
|
5
|
Cai F, He J, Liu Y, Zhang H. BCSLinker: automatic method for constructing a knowledge graph of venous thromboembolism based on joint learning. Front Med (Lausanne) 2024; 11:1272224. [PMID: 38784240 PMCID: PMC11111956 DOI: 10.3389/fmed.2024.1272224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 04/23/2024] [Indexed: 05/25/2024] Open
Abstract
Background Venous thromboembolism (VTE) is characterized by high morbidity, mortality, and complex treatment. A VTE knowledge graph (VTEKG) can effectively integrate VTE-related medical knowledge and offer an intuitive description and analysis of the relations between medical entities. However, current methods for constructing knowledge graphs typically suffer from error propagation and redundant information. Methods In this study, we propose a deep learning-based joint extraction model, Biaffine Common-Sequence Self-Attention Linker (BCSLinker), for Chinese electronic medical records to address the issues mentioned above, which often occur when constructing a VTEKG. First, the Biaffine Common-Sequence Self-Attention (BCsSa) module is employed to create global matrices and extract entities and relations simultaneously, mitigating error propagation. Second, the multi-label cross-entropy loss is utilized to diminish the impact of redundant information and enhance information extraction. Results We used the electronic medical record data of VTE patients from a tertiary hospital, achieving an F1 score of 86.9% on BCSLinker. It outperforms the other joint entity and relation extraction models discussed in this study. In addition, we developed a question-answering system based on the VTEKG as a structured data source. Conclusion This study has constructed a more accurate and comprehensive VTEKG that can provide reference for diagnosing, evaluating, and treating VTE as well as supporting patient self-care, which is of considerable clinical value.
Collapse
Affiliation(s)
- Fenghua Cai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Jianfeng He
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Yunchuan Liu
- Department of Medical Imaging, The First People Hospital of Anning City, Anning, China
| | - Hongjiang Zhang
- Department of Medical Imaging, The First People Hospital of Anning City, Anning, China
| |
Collapse
|
6
|
Li R, La K, Lei J, Huang L, Ouyang J, Shu Y, Yang S. Joint extraction model of entity relations based on decomposition strategy. Sci Rep 2024; 14:1786. [PMID: 38245548 PMCID: PMC10799866 DOI: 10.1038/s41598-024-51559-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 01/06/2024] [Indexed: 01/22/2024] Open
Abstract
Named entity recognition and relation extraction are two important fundamental tasks in natural language processing. The joint entity-relationship extraction model based on parameter sharing can effectively reduce the impact of cascading errors on model performance by performing joint learning of entities and relationships in a single model, but it still cannot essentially get rid of the influence of pipeline models and suffers from entity information redundancy and inability to recognize overlapping entities. To this end, we propose a joint extraction model based on the decomposition strategy of pointer mechanism is proposed. The joint extraction task is divided into two parts. First, identify the head entity, utilizing the positive gain effect of the head entity on tail entity identification.Then, utilize a hierarchical model to improve the accuracy of the tail entity and relationship identification. Meanwhile, we introduce a pointer model to obtain the joint features of entity boundaries and relationship types to achieve boundary-aware classification. The experimental results show that the model achieves better results on both NYT and WebNLG datasets.
Collapse
Affiliation(s)
- Ran Li
- Guizhou Power Grid Company Limited, Guiyang, 550000, China
| | - Kaijun La
- Zhejiang University of Science and Technology, Hangzhou, 310023, China
| | - Jingsheng Lei
- Zhejiang University of Science and Technology, Hangzhou, 310023, China
| | - Liya Huang
- Guizhou Power Grid Company Limited, Guiyang, 550000, China
| | - Jing Ouyang
- Guizhou Power Grid Company Limited, Guiyang, 550000, China
| | - Yu Shu
- Guizhou Power Grid Company Limited, Guiyang, 550000, China
| | - Shengying Yang
- Zhejiang University of Science and Technology, Hangzhou, 310023, China.
| |
Collapse
|