Session 4-2

Toward to the Definition of Safe Character Set of Nushu in ISO/IEC 10646

  • Toshiya Suzuki (Hiroshima University)

Nushu (Chinese word meaning “Women Writing”) is a writing system which was used by women in area. Although its origin is still unidentified because the reliable historical evidence is limited, widely accepted assumption is that the most part of remaining Nushu is derived from Hanzi. But the identities of the source Hanzi (shape, pronunciation and meaning) are remarkably simplified; single Nushu character can be used for different Hanzi with different meanings, as far as their phonetic values are similar. Sometimes same character could be used for different phonetic values. From the survey of the dialectic sound system in the area, the number of the distinctive phonetic values for Nushu users is estimated to be from 800 to 1100, but the number of the characters that an author could use distinctively is estimated to be from 500 to 1000. Thus, Nushu could be understood as a transitional script evolving from the ideographic script to the syllablic script. The typical usage of Nushu was the one-to-one communication between the adopted sisters, and the one-to-many communication (like epigraphis, sutras, open letters, newspapers etc) via Nushu is rare. As a result, the identifications of Nushu characters have not been stabilized yet. Furthermore, the syllablicalization of Nushu characters is not uniform; some authors find a semantic and phonetic differences in a pair, others find no difference and take the pair as an interchangeable variant. As a result, there is no character list for the elementary study (like “Thousand Character Classic”(千字文) or “Cangjie Wordbook”(倉頡篇) for Hanzi). There are several Nushu-Chinese dictionaries compiled for the scholars, but the identifications of Nushu characters are quite varied.

Since 2003, a discussion to include Nushu in ISO/IEC 10646 (the basis of Unicode) has begun; the proposal is primarily submitted by Professor Zhao Liming from China. However, the ballot to fix the scripts to be included in ISO/IEC 10646:2012 /Amd.1:2014 decided to postpone Nushu to the next amendment, because several technical questions are raised to the proposed character set. The important problem is that most existing Nushu dictionaries have their own mutually-incompatible collation methods and the number of indexing characters are quite varied too, therefore it is difficult to compile a stable Nushu character set by combining the existing dictionaries; according to the research by Sun Qi (2005), the number of the indexing characters in 8 Nushu dictionaries published in 1986–2002 are varied from 470 to 1800. In addition, the combinations of the representative glyphs and its descriptions are often inverted for the similar-but-different characters (e.g. in a dictionary, glyph A and meaning X, glyph B and meaning Y are listed, but another dictionary lists glyph A with meaning Y). Because the machinery computation of the cross sections of the existing Nushu dictionaries is impossible, the standardization of Nushu for ISO/IEC 10646 have to start from the definition how Nushu characters in the Unicode should be identified.

Zhao tried to exclude the confused relationship between the glyph and its usage by making the per-author statistics (2006). But the glyph distinction rule is not clarified, and the proposed charset mixes the statistic results without the normalization. As a result, the definition of the entity to be enumerated is still unclear. In this report, the detailed discussion in ISO/IEC JTC1/SC2/WG2 (the working group for the development of ISO/IEC 10646) and the future tasks are summarized.

Nushu, glyph, character set, character encoding, Unicode