Too Long; Didn't Read
Chinese language belongs to the so-called CJK language family (Chinese, Japanese, and Korean) They are probably the most complicated languages for full-text search implement as in them word meanings heavily depend on numerous hieroglyphs variations and their sequences and the characters are not split up into words. To find an exact match in a full text search, we have to face the challenge of tokenization whose main task is to break down the text into low-level units of values that can be searched by the user. The easiest way of Chinese text segmentation assumes the use of N-grams.