Authors:
(1) Hanqing ZHAO, College of Traditional Chinese Medicine, Hebei University, Funded by National Natural Science Foundation of China (No.82004503) and Science and Technology Project of Hebei Education Department(BJK2024108) and a Corresponding Author (zhaohq@hbu.edu.cn);
(2) Yuehan LI, College of Traditional Chinese Medicine, Hebei University.
Table of Links
2. Materials and Methods
2.1 Experimental Data and 2.2 Conditional random fields mode
2.3 TF-IDF algorithm and 2.4 Dependency Parser Based on Neural Network
3 Experimental results
3.1 Results of word segmentation and entity recognition
3.2 Visualization results of related entity vocabulary map
3.3 Results of dependency parsing
3.1 Results of word segmentation and entity recognition
The experiment completed word segmentation and entity recognition of the full texts of the Origin of Medicine, Spleen and Stomach Theory and Yin Syndrome Lue Case, and obtained 472, 899 and 726 corpus items respectively. Due to the difficulty in defining the entity attributes of the text data of ancient Chinese medicine books, this study only divides the entity categories such as nouns, verbs, adjectives, and modal words, and focuses on the meaning of the entity words. The word frequencies and TF-IDF evaluation importance extracted by relevant natural language processing are shown in Table 1-3.
3.2 Visualization results of related entity vocabulary map
The relevant data were collated and summarized, and the word cloud map was drawn according to the entity word importance data, as shown in Figure 2.
3.3 Results of dependency parsing
This study completed the partial syntactic analysis of all the clauses of the three works. Taking the text description of the theory of quoting classics in Medical Qi Yuan as an example, the sample data were extracted for relation extraction and image rendering.
The sample texts are as follows:
Each sutra quotes the Sun Sutra, Qiang Huo; In the lower yellow cypress, small intestine, bladder also. Shaoyang meridian, Bupleurum; In the lower Qingpi, bile, sanjiao also. Yangming meridian, cohosh, angelica dahurica; In the lower, gypsum, stomach, large intestine also. Taiyin meridian, Baishao medicine, spleen, lung also. Shaoyin meridian, anemarrhena, heart and kidney. Jieyin meridian, Qingpi; In the lower, bupleurum, liver, envelop also. The medicine of the above 12 classics is also.
The dependency grammar tree is constructed as shown in Figure 3. The model can recognize this text in classical Chinese, analyze its grammatical structure according to the entity recognition results, and extract the relationship between entities. Taking the Sun Meridian as an example, it can clearly distinguish the relationship between the Sun Meridian and Qiang Huo, and between the yellow cypress and small intestine, bladder.
4 Final Remarks
In this study, the named entity recognition method based on conditional random fields is used to analyze the entity vocabulary, semantic features and syntactic structure of the text data of the Yishui School of traditional Chinese Medicine. The extraction of key named entities from unstructured text data has achieved good results. It has important theoretical and practical guiding value for the summary of academic views of different doctors of the Yishui School, the discovery of differences in academic ideas, and the study of the inheritance of the Yishui School. In the next step, on the basis of named entity recognition, we will continue to study TCM entity relation extraction from classical Chinese data, and then construct the knowledge graph of Yishui School, which provides reference for the application of artificial intelligence methods in the research of TCM school.
5 References
[1] WANG C D,XU J,ZHANG Y. Review of entity relationship extraction[J]. Computer Engineering and Applications, 2022, 56 (12) : 25 to 36.
[2] Liu Liu, Wang Dongbo. A Survey on Named Entity Recognition [J]. Journal of Information Science,2018,37(3):329-340.
[3] GONG Yishan, DUan Yaqi. Research on Chinese Named entity Recognition method Based on different models [J]. Yangtze River Information and Communication,2021,34(1):84-86.
[4] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Comptation, 1989,1(4):541-551.
[5] Xia Yu-lu. Review on the Development of Recurrent Neural Networks [J]. Computer Knowledge and Technology,2019,15(21):182-184.
[6] XIE X Z. Research on disease knowledge map construction technology for traditional Chinese medicine orthopedic consultation[D]. Kunming: Kunming University of Science and Technology,2019.
[7] ZHANG Yingying. Research and construction of tongue imaging diagnosis system based on knowledge graph [D]. Chengdu: University of Electronic Science and Technology of China,2019. (in Chinese)
[8] WANG S. Research and application of knowledge extraction method from Chinese herbal medicine literature [D]. Changchun: Jilin University,2020. (in Chinese with English abstract)
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.