NLP and CRF Models for Mining Traditional Chinese Medicine

by Text MiningApril 30th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

AI and NLP reveal key terms and relationships in ancient TCM texts, paving the way for knowledge graphs and new insights into medical traditions.

People Mentioned

Mention Thumbnail
Mention Thumbnail

Company Mentioned

Mention Thumbnail
featured image - NLP and CRF Models for Mining Traditional Chinese Medicine
Text Mining HackerNoon profile picture
0-item

Authors:

(1) Hanqing ZHAO, College of Traditional Chinese Medicine, Hebei University, Funded by National Natural Science Foundation of China (No.82004503) and Science and Technology Project of Hebei Education Department(BJK2024108) and a Corresponding Author (zhaohq@hbu.edu.cn);

(2) Yuehan LI, College of Traditional Chinese Medicine, Hebei University.

Abstract and 1. Introduction

2. Materials and Methods

2.1 Experimental Data and 2.2 Conditional random fields mode

2.3 TF-IDF algorithm and 2.4 Dependency Parser Based on Neural Network

2.5 Experimental Environment

3 Experimental results

3.1 Results of word segmentation and entity recognition

3.2 Visualization results of related entity vocabulary map

3.3 Results of dependency parsing

4 Final Remarks

5 References

3.1 Results of word segmentation and entity recognition

The experiment completed word segmentation and entity recognition of the full texts of the Origin of Medicine, Spleen and Stomach Theory and Yin Syndrome Lue Case, and obtained 472, 899 and 726 corpus items respectively. Due to the difficulty in defining the entity attributes of the text data of ancient Chinese medicine books, this study only divides the entity categories such as nouns, verbs, adjectives, and modal words, and focuses on the meaning of the entity words. The word frequencies and TF-IDF evaluation importance extracted by relevant natural language processing are shown in Table 1-3.


Table 1 Results of natural language processing for the Origin of Medicine


Table 2 Results of natural language processing of Spleen and Stomach Theory


Table 3 Results of natural language processing for "Yin Syndrome Lue


The relevant data were collated and summarized, and the word cloud map was drawn according to the entity word importance data, as shown in Figure 2.



FIG. 2 Word cloud map of key entities in representative works of Yi-shui School


3.3 Results of dependency parsing

This study completed the partial syntactic analysis of all the clauses of the three works. Taking the text description of the theory of quoting classics in Medical Qi Yuan as an example, the sample data were extracted for relation extraction and image rendering.


The sample texts are as follows:


Each sutra quotes the Sun Sutra, Qiang Huo; In the lower yellow cypress, small intestine, bladder also. Shaoyang meridian, Bupleurum; In the lower Qingpi, bile, sanjiao also. Yangming meridian, cohosh, angelica dahurica; In the lower, gypsum, stomach, large intestine also. Taiyin meridian, Baishao medicine, spleen, lung also. Shaoyin meridian, anemarrhena, heart and kidney. Jieyin meridian, Qingpi; In the lower, bupleurum, liver, envelop also. The medicine of the above 12 classics is also.


The dependency grammar tree is constructed as shown in Figure 3. The model can recognize this text in classical Chinese, analyze its grammatical structure according to the entity recognition results, and extract the relationship between entities. Taking the Sun Meridian as an example, it can clearly distinguish the relationship between the Sun Meridian and Qiang Huo, and between the yellow cypress and small intestine, bladder.



FIG. 3 Dependency grammar tree of quotations from Chinese herbal medicine


4 Final Remarks

In this study, the named entity recognition method based on conditional random fields is used to analyze the entity vocabulary, semantic features and syntactic structure of the text data of the Yishui School of traditional Chinese Medicine. The extraction of key named entities from unstructured text data has achieved good results. It has important theoretical and practical guiding value for the summary of academic views of different doctors of the Yishui School, the discovery of differences in academic ideas, and the study of the inheritance of the Yishui School. In the next step, on the basis of named entity recognition, we will continue to study TCM entity relation extraction from classical Chinese data, and then construct the knowledge graph of Yishui School, which provides reference for the application of artificial intelligence methods in the research of TCM school.


5 References

[1] WANG C D,XU J,ZHANG Y. Review of entity relationship extraction[J]. Computer Engineering and Applications, 2022, 56 (12) : 25 to 36.


[2] Liu Liu, Wang Dongbo. A Survey on Named Entity Recognition [J]. Journal of Information Science,2018,37(3):329-340.


[3] GONG Yishan, DUan Yaqi. Research on Chinese Named entity Recognition method Based on different models [J]. Yangtze River Information and Communication,2021,34(1):84-86.


[4] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Comptation, 1989,1(4):541-551.


[5] Xia Yu-lu. Review on the Development of Recurrent Neural Networks [J]. Computer Knowledge and Technology,2019,15(21):182-184.


[6] XIE X Z. Research on disease knowledge map construction technology for traditional Chinese medicine orthopedic consultation[D]. Kunming: Kunming University of Science and Technology,2019.


[7] ZHANG Yingying. Research and construction of tongue imaging diagnosis system based on knowledge graph [D]. Chengdu: University of Electronic Science and Technology of China,2019. (in Chinese)


[8] WANG S. Research and application of knowledge extraction method from Chinese herbal medicine literature [D]. Changchun: Jilin University,2020. (in Chinese with English abstract)


This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks