From bb5bbc20702981c287aa3e44640e7d2f2b9a28cf Mon Sep 17 00:00:00 2001 From: Mike FABIAN Date: Thu, 14 Sep 2023 18:01:40 +0200 Subject: Update to Unicode 15.1.0 [BZ #30854] Unicode 15.1.0 Support: Character encoding, character type info, and transliteration tables are all updated to Unicode 15.1.0, using the generator scripts contributed by Mike FABIAN (Red Hat). Total removed characters in newly generated CHARMAP: 0 Total changed characters in newly generated CHARMAP: 0 Total added characters in newly generated CHARMAP: 627 Total removed characters in newly generated WIDTH: 0 Total changed characters in newly generated WIDTH: 0 Total added characters in newly generated WIDTH: 627 alpha: Added 622 characters in new ctype which were not in old ctype graph: Added 627 characters in new ctype which were not in old ctype print: Added 627 characters in new ctype which were not in old ctype punct: Added 5 characters in new ctype which were not in old ctype The five characters added to punct are: 2FFC;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM RIGHT;So;0;ON;;;;;N;;;;; 2FFD;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER RIGHT;So;0;ON;;;;;N;;;;; 2FFE;IDEOGRAPHIC DESCRIPTION CHARACTER HORIZONTAL REFLECTION;So;0;ON;;;;;N;;;;; 2FFF;IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION;So;0;ON;;;;;N;;;;; 31EF;IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION;So;0;ON;;;;;N;;;;; The Unicode announcement blog entry says "[...] adds 627 characters, [...] additions include 622 CJK unified ideographs in a new block, [...]", so that looks OK. The Unicode blog mentions "six completely new emoji" but they don't appear here as they are all sequences and not single code points. Resolves: BZ #30854 Reviewed-by: Carlos O'Donell --- localedata/charmaps/UTF-8 | 23 +- localedata/locales/i18n_ctype | 200 +- localedata/locales/tr_TR | 200 +- localedata/locales/translit_circle | 2 +- localedata/locales/translit_cjk_compat | 2 +- localedata/locales/translit_combining | 2 +- localedata/locales/translit_compat | 2 +- localedata/locales/translit_font | 2 +- localedata/locales/translit_fraction | 2 +- localedata/unicode-gen/DerivedCoreProperties.txt | 277 +- localedata/unicode-gen/EastAsianWidth.txt | 5170 +++++++++++----------- localedata/unicode-gen/Makefile | 2 +- localedata/unicode-gen/PropList.txt | 78 +- localedata/unicode-gen/UnicodeData.txt | 7 + 14 files changed, 3155 insertions(+), 2814 deletions(-) diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8 index bd8075f20d..94f20d5e87 100644 --- a/localedata/charmaps/UTF-8 +++ b/localedata/charmaps/UTF-8 @@ -11240,6 +11240,10 @@ CHARMAP /xe2/xbf/xb9 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT /xe2/xbf/xba IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT /xe2/xbf/xbb IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID + /xe2/xbf/xbc IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM RIGHT + /xe2/xbf/xbd IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER RIGHT + /xe2/xbf/xbe IDEOGRAPHIC DESCRIPTION CHARACTER HORIZONTAL REFLECTION + /xe2/xbf/xbf IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION /xe3/x80/x80 IDEOGRAPHIC SPACE /xe3/x80/x81 IDEOGRAPHIC COMMA /xe3/x80/x82 IDEOGRAPHIC FULL STOP @@ -11714,6 +11718,7 @@ CHARMAP /xe3/x87/xa1 CJK STROKE HZZZG /xe3/x87/xa2 CJK STROKE PG /xe3/x87/xa3 CJK STROKE Q + /xe3/x87/xaf IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION /xe3/x87/xb0 KATAKANA LETTER SMALL KU /xe3/x87/xb1 KATAKANA LETTER SMALL SI /xe3/x87/xb2 KATAKANA LETTER SMALL SU @@ -46767,6 +46772,16 @@ CHARMAP .. /xf0/xae/xac/xb0 .. /xf0/xae/xad/xb0 .. /xf0/xae/xae/xb0 +.. /xf0/xae/xaf/xb0 +.. /xf0/xae/xb0/xb0 +.. /xf0/xae/xb1/xb0 +.. /xf0/xae/xb2/xb0 +.. /xf0/xae/xb3/xb0 +.. /xf0/xae/xb4/xb0 +.. /xf0/xae/xb5/xb0 +.. /xf0/xae/xb6/xb0 +.. /xf0/xae/xb7/xb0 +.. /xf0/xae/xb8/xb0 /xf0/xaf/xa0/x80 CJK COMPATIBILITY IDEOGRAPH-2F800 /xf0/xaf/xa0/x81 CJK COMPATIBILITY IDEOGRAPH-2F801 /xf0/xaf/xa0/x82 CJK COMPATIBILITY IDEOGRAPH-2F802 @@ -49840,7 +49855,7 @@ CHARMAP .. /xf4/x8f/xbf/x80 END CHARMAP -% Character width according to Unicode 15.0.0. +% Character width according to Unicode 15.1.0. % - Default width is 1. % - Double-width characters have width 2; generated from % "grep '^[^;]*;[WF]' EastAsianWidth.txt" @@ -50061,8 +50076,7 @@ WIDTH ... 2 ... 2 ... 2 -... 2 -... 2 +... 2 ... 0 ... 2 ... 2 @@ -50071,7 +50085,7 @@ WIDTH ... 2 ... 2 ... 2 -... 2 +... 2 ... 2 ... 2 ... 0 @@ -50325,6 +50339,7 @@ WIDTH ... 2 ... 2 ... 2 +... 2 ... 2 ... 2 ... 2 diff --git a/localedata/locales/i18n_ctype b/localedata/locales/i18n_ctype index 850c902cc1..f86855c6c6 100644 --- a/localedata/locales/i18n_ctype +++ b/localedata/locales/i18n_ctype @@ -26,13 +26,13 @@ fax "" language "" territory "Earth" revision "14.0.0" -date "2022-10-04" +date "2023-09-15" category "i18n:2012";LC_CTYPE END LC_IDENTIFICATION LC_CTYPE % The following is the 14652 i18n fdcc-set LC_CTYPE category. -% It covers Unicode version 15.0.0. +% It covers Unicode version 15.1.0. % The character classes and mapping tables were automatically % generated using the gen_unicode_ctype.py program. @@ -497,8 +497,9 @@ alpha / ..;..;/ ..;..;/ ..;..;/ - ..;..;/ - ..;.. + ..;..;/ + ..;..;/ + .. % The "digit" class must only contain the BASIC LATIN digits, says ISO C 99 % (sections 7.25.2.1.5 and 5.2.1). @@ -561,19 +562,19 @@ punct / ..;..;..;..;/ ..;..;..;..;/ ..;;;..;..;/ - ..;..;..;..;/ + ..;..;..;..;/ ..;..;..;..;/ ..;..;;;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;;..;..;/ - ..;;..;..;/ - ..;;..;;;;/ - ..;..;..;..;/ - ;;..;..;;;/ - ..;..;..;;/ - ..;..;;..;/ + ..;;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;;..;/ + ..;..;;..;/ + ..;..;;..;;/ + ;;..;..;..;/ + ..;;;..;..;/ + ;;..;..;..;/ + ;..;..;;..;/ ..;..;..;;/ ..;..;..;..;/ ..;..;..;/ @@ -725,9 +726,9 @@ graph / ..;..;..;..;/ ..;..;..;..;/ ..;..;..;..;/ - ..;..;..;..;/ + ..;..;..;..;/ ..;..;..;..;/ - ..;..;..;..;/ + ..;..;..;..;/ ..;..;..;..;/ ..;;..;..;/ ..;..;..;..;/ @@ -908,10 +909,10 @@ graph / ..;..;/ ..;..;/ ..;..;/ - ..;..;/ - ..;;..;/ - ..;..;/ - .. + ..;..;/ + ..;..;;/ + ..;..;/ + ..;.. print / ..;..;..;..;/ @@ -980,81 +981,80 @@ print / ..;..;..;..;/ ..;..;..;..;/ ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ;..;..;..;/ - ..;..;;..;/ - ..;..;..;..;/ - ..;;..;..;/ - ..;..;..;..;/ - ..;..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;;..;/ - ..;;..;/ - ..;..;/ - ..;..;/ - ..;..;;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;;;/ - ..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;;/ + ..;..;..;..;/ + ..;;..;..;/ + ..;..;..;..;/ + ;..;..;..;/ + ..;..;..;..;/ + ..;..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;;/ + ..;..;;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;;/ + ;..;..;/ ..;..;/ ..;..;/ ..;..;/ @@ -1163,10 +1163,10 @@ print / ..;..;/ ..;..;/ ..;..;/ - ..;..;/ - ..;;..;/ - ..;..;/ - .. + ..;..;/ + ..;..;;/ + ..;..;/ + ..;.. % The "xdigit" class must only contain the BASIC LATIN digits and A-F, a-f, % says ISO C 99 (sections 7.25.2.1.12 and 6.4.4.1). diff --git a/localedata/locales/tr_TR b/localedata/locales/tr_TR index 33fe2f7bc4..47f5d7015c 100644 --- a/localedata/locales/tr_TR +++ b/localedata/locales/tr_TR @@ -43,7 +43,7 @@ fax "" language "Turkish" territory "Turkey" revision "1.0" -date "2022-10-04" +date "2023-09-15" category "i18n:2012";LC_IDENTIFICATION category "i18n:2012";LC_CTYPE @@ -127,7 +127,7 @@ END LC_COLLATE LC_CTYPE % The following is the 14652 i18n fdcc-set LC_CTYPE category. -% It covers Unicode version 15.0.0. +% It covers Unicode version 15.1.0. % The character classes and mapping tables were automatically % generated using the gen_unicode_ctype.py program. @@ -592,8 +592,9 @@ alpha / ..;..;/ ..;..;/ ..;..;/ - ..;..;/ - ..;.. + ..;..;/ + ..;..;/ + .. % The "digit" class must only contain the BASIC LATIN digits, says ISO C 99 % (sections 7.25.2.1.5 and 5.2.1). @@ -656,19 +657,19 @@ punct / ..;..;..;..;/ ..;..;..;..;/ ..;;;..;..;/ - ..;..;..;..;/ + ..;..;..;..;/ ..;..;..;..;/ ..;..;;;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;;..;..;/ - ..;;..;..;/ - ..;;..;;;;/ - ..;..;..;..;/ - ;;..;..;;;/ - ..;..;..;;/ - ..;..;;..;/ + ..;;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;;..;/ + ..;..;;..;/ + ..;..;;..;;/ + ;;..;..;..;/ + ..;;;..;..;/ + ;;..;..;..;/ + ;..;..;;..;/ ..;..;..;;/ ..;..;..;..;/ ..;..;..;/ @@ -820,9 +821,9 @@ graph / ..;..;..;..;/ ..;..;..;..;/ ..;..;..;..;/ - ..;..;..;..;/ + ..;..;..;..;/ ..;..;..;..;/ - ..;..;..;..;/ + ..;..;..;..;/ ..;..;..;..;/ ..;;..;..;/ ..;..;..;..;/ @@ -1003,10 +1004,10 @@ graph / ..;..;/ ..;..;/ ..;..;/ - ..;..;/ - ..;;..;/ - ..;..;/ - .. + ..;..;/ + ..;..;;/ + ..;..;/ + ..;.. print / ..;..;..;..;/ @@ -1075,81 +1076,80 @@ print / ..;..;..;..;/ ..;..;..;..;/ ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ..;..;..;..;/ - ;..;..;..;/ - ..;..;;..;/ - ..;..;..;..;/ - ..;;..;..;/ - ..;..;..;..;/ - ..;..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;;..;/ - ..;;..;/ - ..;..;/ - ..;..;/ - ..;..;;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;..;/ - ..;;;/ - ..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;..;/ + ..;..;..;;/ + ..;..;..;..;/ + ..;;..;..;/ + ..;..;..;..;/ + ;..;..;..;/ + ..;..;..;..;/ + ..;..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;;/ + ..;..;;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;/ + ..;..;;/ + ;..;..;/ ..;..;/ ..;..;/ ..;..;/ @@ -1258,10 +1258,10 @@ print / ..;..;/ ..;..;/ ..;..;/ - ..;..;/ - ..;;..;/ - ..;..;/ - .. + ..;..;/ + ..;..;;/ + ..;..;/ + ..;.. % The "xdigit" class must only contain the BASIC LATIN digits and A-F, a-f, % says ISO C 99 (sections 7.25.2.1.12 and 6.4.4.1). diff --git a/localedata/locales/translit_circle b/localedata/locales/translit_circle index ff7abc29ff..20fd57168f 100644 --- a/localedata/locales/translit_circle +++ b/localedata/locales/translit_circle @@ -9,7 +9,7 @@ comment_char % % otherwise be governed by that license. % Transliterations of encircled characters. -% Generated automatically from UnicodeData.txt by gen_translit_circle.py on 2022-10-04 for Unicode 15.0.0. +% Generated automatically from UnicodeData.txt by gen_translit_circle.py on 2023-09-15 for Unicode 15.1.0. LC_CTYPE diff --git a/localedata/locales/translit_cjk_compat b/localedata/locales/translit_cjk_compat index c3fdccb8a9..7951e0cd64 100644 --- a/localedata/locales/translit_cjk_compat +++ b/localedata/locales/translit_cjk_compat @@ -9,7 +9,7 @@ comment_char % % otherwise be governed by that license. % Transliterations of CJK compatibility characters. -% Generated automatically from UnicodeData.txt by gen_translit_cjk_compat.py on 2022-10-04 for Unicode 15.0.0. +% Generated automatically from UnicodeData.txt by gen_translit_cjk_compat.py on 2023-09-15 for Unicode 15.1.0. LC_CTYPE diff --git a/localedata/locales/translit_combining b/localedata/locales/translit_combining index b4732d158b..ce2f19eee1 100644 --- a/localedata/locales/translit_combining +++ b/localedata/locales/translit_combining @@ -10,7 +10,7 @@ comment_char % % Transliterations that remove all combining characters (accents, % pronounciation marks, etc.). -% Generated automatically from UnicodeData.txt by gen_translit_combining.py on 2022-10-04 for Unicode 15.0.0. +% Generated automatically from UnicodeData.txt by gen_translit_combining.py on 2023-09-15 for Unicode 15.1.0. LC_CTYPE diff --git a/localedata/locales/translit_compat b/localedata/locales/translit_compat index 489a2c2678..7a214b2723 100644 --- a/localedata/locales/translit_compat +++ b/localedata/locales/translit_compat @@ -9,7 +9,7 @@ comment_char % % otherwise be governed by that license. % Transliterations of compatibility characters and ligatures. -% Generated automatically from UnicodeData.txt by gen_translit_compat.py on 2022-10-04 for Unicode 15.0.0. +% Generated automatically from UnicodeData.txt by gen_translit_compat.py on 2023-09-15 for Unicode 15.1.0. LC_CTYPE diff --git a/localedata/locales/translit_font b/localedata/locales/translit_font index e21de6c530..a977ae1f29 100644 --- a/localedata/locales/translit_font +++ b/localedata/locales/translit_font @@ -9,7 +9,7 @@ comment_char % % otherwise be governed by that license. % Transliterations of font equivalents. -% Generated automatically from UnicodeData.txt by gen_translit_font.py on 2022-10-04 for Unicode 15.0.0. +% Generated automatically from UnicodeData.txt by gen_translit_font.py on 2023-09-15 for Unicode 15.1.0. LC_CTYPE diff --git a/localedata/locales/translit_fraction b/localedata/locales/translit_fraction index c4b0367bd9..115273ce05 100644 --- a/localedata/locales/translit_fraction +++ b/localedata/locales/translit_fraction @@ -9,7 +9,7 @@ comment_char % % otherwise be governed by that license. % Transliterations of fractions. -% Generated automatically from UnicodeData.txt by gen_translit_fraction.py on 2022-10-04 for Unicode 15.0.0. +% Generated automatically from UnicodeData.txt by gen_translit_fraction.py on 2023-09-15 for Unicode 15.1.0. % The replacements have been surrounded with spaces, because fractions are % often preceded by a decimal number and followed by a unit or a math symbol. diff --git a/localedata/unicode-gen/DerivedCoreProperties.txt b/localedata/unicode-gen/DerivedCoreProperties.txt index 8b482b5c10..220c55685d 100644 --- a/localedata/unicode-gen/DerivedCoreProperties.txt +++ b/localedata/unicode-gen/DerivedCoreProperties.txt @@ -1,6 +1,6 @@ -# DerivedCoreProperties-15.0.0.txt -# Date: 2022-08-05, 22:17:05 GMT -# © 2022 Unicode®, Inc. +# DerivedCoreProperties-15.1.0.txt +# Date: 2023-08-07, 15:21:24 GMT +# © 2023 Unicode®, Inc. # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. # For terms of use, see https://www.unicode.org/terms_of_use.html # @@ -1397,11 +1397,12 @@ FFDA..FFDC ; Alphabetic # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANG 2B740..2B81D ; Alphabetic # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; Alphabetic # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; Alphabetic # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; Alphabetic # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; Alphabetic # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; Alphabetic # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; Alphabetic # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF -# Total code points: 137765 +# Total code points: 138387 # ================================================ @@ -6853,11 +6854,12 @@ FFDA..FFDC ; ID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL 2B740..2B81D ; ID_Start # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; ID_Start # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; ID_Start # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; ID_Start # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; ID_Start # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; ID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; ID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF -# Total code points: 136345 +# Total code points: 136967 # ================================================ @@ -7438,6 +7440,7 @@ FFDA..FFDC ; ID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL 1FE0..1FEC ; ID_Continue # L& [13] GREEK SMALL LETTER UPSILON WITH VRACHY..GREEK CAPITAL LETTER RHO WITH DASIA 1FF2..1FF4 ; ID_Continue # L& [3] GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI..GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI 1FF6..1FFC ; ID_Continue # L& [7] GREEK SMALL LETTER OMEGA WITH PERISPOMENI..GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI +200C..200D ; ID_Continue # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER 203F..2040 ; ID_Continue # Pc [2] UNDERTIE..CHARACTER TIE 2054 ; ID_Continue # Pc INVERTED UNDERTIE 2071 ; ID_Continue # Lm SUPERSCRIPT LATIN SMALL LETTER I @@ -7504,6 +7507,7 @@ FFDA..FFDC ; ID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL 309D..309E ; ID_Continue # Lm [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK 309F ; ID_Continue # Lo HIRAGANA DIGRAPH YORI 30A1..30FA ; ID_Continue # Lo [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO +30FB ; ID_Continue # Po KATAKANA MIDDLE DOT 30FC..30FE ; ID_Continue # Lm [3] KATAKANA-HIRAGANA PROLONGED SOUND MARK..KATAKANA VOICED ITERATION MARK 30FF ; ID_Continue # Lo KATAKANA DIGRAPH KOTO 3105..312F ; ID_Continue # Lo [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN @@ -7683,6 +7687,7 @@ FF10..FF19 ; ID_Continue # Nd [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NIN FF21..FF3A ; ID_Continue # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z FF3F ; ID_Continue # Pc FULLWIDTH LOW LINE FF41..FF5A ; ID_Continue # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z +FF65 ; ID_Continue # Po HALFWIDTH KATAKANA MIDDLE DOT FF66..FF6F ; ID_Continue # Lo [10] HALFWIDTH KATAKANA LETTER WO..HALFWIDTH KATAKANA LETTER SMALL TU FF70 ; ID_Continue # Lm HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK FF71..FF9D ; ID_Continue # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAKANA LETTER N @@ -8207,12 +8212,13 @@ FFDA..FFDC ; ID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HAN 2B740..2B81D ; ID_Continue # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; ID_Continue # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; ID_Continue # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; ID_Continue # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; ID_Continue # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; ID_Continue # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; ID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF E0100..E01EF ; ID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 -# Total code points: 139482 +# Total code points: 140108 # ================================================ @@ -8962,11 +8968,12 @@ FFDA..FFDC ; XID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGU 2B740..2B81D ; XID_Start # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; XID_Start # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; XID_Start # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; XID_Start # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; XID_Start # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; XID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; XID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF -# Total code points: 136322 +# Total code points: 136944 # ================================================ @@ -9543,6 +9550,7 @@ FFDA..FFDC ; XID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGU 1FE0..1FEC ; XID_Continue # L& [13] GREEK SMALL LETTER UPSILON WITH VRACHY..GREEK CAPITAL LETTER RHO WITH DASIA 1FF2..1FF4 ; XID_Continue # L& [3] GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI..GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI 1FF6..1FFC ; XID_Continue # L& [7] GREEK SMALL LETTER OMEGA WITH PERISPOMENI..GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI +200C..200D ; XID_Continue # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER 203F..2040 ; XID_Continue # Pc [2] UNDERTIE..CHARACTER TIE 2054 ; XID_Continue # Pc INVERTED UNDERTIE 2071 ; XID_Continue # Lm SUPERSCRIPT LATIN SMALL LETTER I @@ -9608,6 +9616,7 @@ FFDA..FFDC ; XID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGU 309D..309E ; XID_Continue # Lm [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK 309F ; XID_Continue # Lo HIRAGANA DIGRAPH YORI 30A1..30FA ; XID_Continue # Lo [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO +30FB ; XID_Continue # Po KATAKANA MIDDLE DOT 30FC..30FE ; XID_Continue # Lm [3] KATAKANA-HIRAGANA PROLONGED SOUND MARK..KATAKANA VOICED ITERATION MARK 30FF ; XID_Continue # Lo KATAKANA DIGRAPH KOTO 3105..312F ; XID_Continue # Lo [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN @@ -9793,6 +9802,7 @@ FF10..FF19 ; XID_Continue # Nd [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NI FF21..FF3A ; XID_Continue # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z FF3F ; XID_Continue # Pc FULLWIDTH LOW LINE FF41..FF5A ; XID_Continue # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z +FF65 ; XID_Continue # Po HALFWIDTH KATAKANA MIDDLE DOT FF66..FF6F ; XID_Continue # Lo [10] HALFWIDTH KATAKANA LETTER WO..HALFWIDTH KATAKANA LETTER SMALL TU FF70 ; XID_Continue # Lm HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK FF71..FF9D ; XID_Continue # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAKANA LETTER N @@ -10317,12 +10327,13 @@ FFDA..FFDC ; XID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HA 2B740..2B81D ; XID_Continue # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; XID_Continue # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; XID_Continue # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; XID_Continue # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; XID_Continue # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; XID_Continue # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; XID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF E0100..E01EF ; XID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 -# Total code points: 139463 +# Total code points: 140089 # ================================================ @@ -10335,6 +10346,15 @@ E0100..E01EF ; XID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTO # - FFF9..FFFB (Interlinear annotation format characters) # - 13430..13440 (Egyptian hieroglyph format characters) # - Prepended_Concatenation_Mark (Exceptional format characters that should be visible) +# +# There are currently no stability guarantees for DICP. However, the +# values of DICP interact with the derivation of XID_Continue +# and NFKC_CF, for which there are stability guarantees. +# Maintainers of this property should note that in the +# unlikely case that the DICP value changes for an existing character +# which is also XID_Continue=Yes, then exceptions must be put +# in place to ensure that the NFKC_CF mapping value for that +# existing character does not change. 00AD ; Default_Ignorable_Code_Point # Cf SOFT HYPHEN 034F ; Default_Ignorable_Code_Point # Mn COMBINING GRAPHEME JOINER @@ -11602,7 +11622,7 @@ E0100..E01EF ; Grapheme_Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELE 2E80..2E99 ; Grapheme_Base # So [26] CJK RADICAL REPEAT..CJK RADICAL RAP 2E9B..2EF3 ; Grapheme_Base # So [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE 2F00..2FD5 ; Grapheme_Base # So [214] KANGXI RADICAL ONE..KANGXI RADICAL FLUTE -2FF0..2FFB ; Grapheme_Base # So [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID +2FF0..2FFF ; Grapheme_Base # So [16] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION 3000 ; Grapheme_Base # Zs IDEOGRAPHIC SPACE 3001..3003 ; Grapheme_Base # Po [3] IDEOGRAPHIC COMMA..DITTO MARK 3004 ; Grapheme_Base # So JAPANESE INDUSTRIAL STANDARD SYMBOL @@ -11657,6 +11677,7 @@ E0100..E01EF ; Grapheme_Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELE 3196..319F ; Grapheme_Base # So [10] IDEOGRAPHIC ANNOTATION TOP MARK..IDEOGRAPHIC ANNOTATION MAN MARK 31A0..31BF ; Grapheme_Base # Lo [32] BOPOMOFO LETTER BU..BOPOMOFO LETTER AH 31C0..31E3 ; Grapheme_Base # So [36] CJK STROKE T..CJK STROKE Q +31EF ; Grapheme_Base # So IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION 31F0..31FF ; Grapheme_Base # Lo [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO 3200..321E ; Grapheme_Base # So [31] PARENTHESIZED HANGUL KIYEOK..PARENTHESIZED KOREAN CHARACTER O HU 3220..3229 ; Grapheme_Base # No [10] PARENTHESIZED IDEOGRAPH ONE..PARENTHESIZED IDEOGRAPH TEN @@ -12497,11 +12518,12 @@ FFFC..FFFD ; Grapheme_Base # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEME 2B740..2B81D ; Grapheme_Base # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D 2B820..2CEA1 ; Grapheme_Base # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 2CEB0..2EBE0 ; Grapheme_Base # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2EBF0..2EE5D ; Grapheme_Base # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D 2F800..2FA1D ; Grapheme_Base # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D 30000..3134A ; Grapheme_Base # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A 31350..323AF ; Grapheme_Base # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF -# Total code points: 146986 +# Total code points: 147613 # ================================================ @@ -12572,4 +12594,239 @@ ABED ; Grapheme_Link # Mn MEETEI MAYEK APUN IYEK # Total code points: 65 +# ================================================ + +# Derived Property: Indic_Conjunct_Break +# Generated from the Grapheme_Cluster_Break, Indic_Syllabic_Category, +# Canonical_Combining_Class, and Script properties as described in UAX #44: +# https://www.unicode.org/reports/tr44/. + +# All code points not explicitly listed for Indic_Conjunct_Break +# have the value None. + +# @missing: 0000..10FFFF; InCB; None + +# ================================================ + +# Indic_Conjunct_Break=Linker + +094D ; InCB; Linker # Mn DEVANAGARI SIGN VIRAMA +09CD ; InCB; Linker # Mn BENGALI SIGN VIRAMA +0ACD ; InCB; Linker # Mn GUJARATI SIGN VIRAMA +0B4D ; InCB; Linker # Mn ORIYA SIGN VIRAMA +0C4D ; InCB; Linker # Mn TELUGU SIGN VIRAMA +0D4D ; InCB; Linker # Mn MALAYALAM SIGN VIRAMA + +# Total code points: 6 + +# ================================================ + +# Indic_Conjunct_Break=Consonant + +0915..0939 ; InCB; Consonant # Lo [37] DEVANAGARI LETTER KA..DEVANAGARI LETTER HA +0958..095F ; InCB; Consonant # Lo [8] DEVANAGARI LETTER QA..DEVANAGARI LETTER YYA +0978..097F ; InCB; Consonant # Lo [8] DEVANAGARI LETTER MARWARI DDA..DEVANAGARI LETTER BBA +0995..09A8 ; InCB; Consonant # Lo [20] BENGALI LETTER KA..BENGALI LETTER NA +09AA..09B0 ; InCB; Consonant # Lo [7] BENGALI LETTER PA..BENGALI LETTER RA +09B2 ; InCB; Consonant # Lo BENGALI LETTER LA +09B6..09B9 ; InCB; Consonant # Lo [4] BENGALI LETTER SHA..BENGALI LETTER HA +09DC..09DD ; InCB; Consonant # Lo [2] BENGALI LETTER RRA..BENGALI LETTER RHA +09DF ; InCB; Consonant # Lo BENGALI LETTER YYA +09F0..09F1 ; InCB; Consonant # Lo [2] BENGALI LETTER RA WITH MIDDLE DIAGONAL..BENGALI LETTER RA WITH LOWER DIAGONAL +0A95..0AA8 ; InCB; Consonant # Lo [20] GUJARATI LETTER KA..GUJARATI LETTER NA +0AAA..0AB0 ; InCB; Consonant # Lo [7] GUJARATI LETTER PA..GUJARATI LETTER RA +0AB2..0AB3 ; InCB; Consonant # Lo [2] GUJARATI LETTER LA..GUJARATI LETTER LLA +0AB5..0AB9 ; InCB; Consonant # Lo [5] GUJARATI LETTER VA..GUJARATI LETTER HA +0AF9 ; InCB; Consonant # Lo GUJARATI LETTER ZHA +0B15..0B28 ; InCB; Consonant # Lo [20] ORIYA LETTER KA..ORIYA LETTER NA +0B2A..0B30 ; InCB; Consonant # Lo [7] ORIYA LETTER PA..ORIYA LETTER RA +0B32..0B33 ; InCB; Consonant # Lo [2] ORIYA LETTER LA..ORIYA LETTER LLA +0B35..0B39 ; InCB; Consonant # Lo [5] ORIYA LETTER VA..ORIYA LETTER HA +0B5C..0B5D ; InCB; Consonant # Lo [2] ORIYA LETTER RRA..ORIYA LETTER RHA +0B5F ; InCB; Consonant # Lo ORIYA LETTER YYA +0B71 ; InCB; Consonant # Lo ORIYA LETTER WA +0C15..0C28 ; InCB; Consonant # Lo [20] TELUGU LETTER KA..TELUGU LETTER NA +0C2A..0C39 ; InCB; Consonant # Lo [16] TELUGU LETTER PA..TELUGU LETTER HA +0C58..0C5A ; InCB; Consonant # Lo [3] TELUGU LETTER TSA..TELUGU LETTER RRRA +0D15..0D3A ; InCB; Consonant # Lo [38] MALAYALAM LETTER KA..MALAYALAM LETTER TTTA + +# Total code points: 240 + +# ================================================ + +# Indic_Conjunct_Break=Extend + +0300..034E ; InCB; Extend # Mn [79] COMBINING GRAVE ACCENT..COMBINING UPWARDS ARROW BELOW +0350..036F ; InCB; Extend # Mn [32] COMBINING RIGHT ARROWHEAD ABOVE..COMBINING LATIN SMALL LETTER X +0483..0487 ; InCB; Extend # Mn [5] COMBINING CYRILLIC TITLO..COMBINING CYRILLIC POKRYTIE +0591..05BD ; InCB; Extend # Mn [45] HEBREW ACCENT ETNAHTA..HEBREW POINT METEG +05BF ; InCB; Extend # Mn HEBREW POINT RAFE +05C1..05C2 ; InCB; Extend # Mn [2] HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT +05C4..05C5 ; InCB; Extend # Mn [2] HEBREW MARK UPPER DOT..HEBREW MARK LOWER DOT +05C7 ; InCB; Extend # Mn HEBREW POINT QAMATS QATAN +0610..061A ; InCB; Extend # Mn [11] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL KASRA +064B..065F ; InCB; Extend # Mn [21] ARABIC FATHATAN..ARABIC WAVY HAMZA BELOW +0670 ; InCB; Extend # Mn ARABIC LETTER SUPERSCRIPT ALEF +06D6..06DC ; InCB; Extend # Mn [7] ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA..ARABIC SMALL HIGH SEEN +06DF..06E4 ; InCB; Extend # Mn [6] ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL HIGH MADDA +06E7..06E8 ; InCB; Extend # Mn [2] ARABIC SMALL HIGH YEH..ARABIC SMALL HIGH NOON +06EA..06ED ; InCB; Extend # Mn [4] ARABIC EMPTY CENTRE LOW STOP..ARABIC SMALL LOW MEEM +0711 ; InCB; Extend # Mn SYRIAC LETTER SUPERSCRIPT ALAPH +0730..074A ; InCB; Extend # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH +07EB..07F3 ; InCB; Extend # Mn [9] NKO COMBINING SHORT HIGH TONE..NKO COMBINING DOUBLE DOT ABOVE +07FD ; InCB; Extend # Mn NKO DANTAYALAN +0816..0819 ; InCB; Extend # Mn [4] SAMARITAN MARK IN..SAMARITAN MARK DAGESH +081B..0823 ; InCB; Extend # Mn [9] SAMARITAN MARK EPENTHETIC YUT..SAMARITAN VOWEL SIGN A +0825..0827 ; InCB; Extend # Mn [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U +0829..082D ; InCB; Extend # Mn [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA +0859..085B ; InCB; Extend # Mn [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK +0898..089F ; InCB; Extend # Mn [8] ARABIC SMALL HIGH WORD AL-JUZ..ARABIC HALF MADDA OVER MADDA +08CA..08E1 ; InCB; Extend # Mn [24] ARABIC SMALL HIGH FARSI YEH..ARABIC SMALL HIGH SIGN SAFHA +08E3..08FF ; InCB; Extend # Mn [29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS NOON GHUNNA +093C ; InCB; Extend # Mn DEVANAGARI SIGN NUKTA +0951..0954 ; InCB; Extend # Mn [4] DEVANAGARI STRESS SIGN UDATTA..DEVANAGARI ACUTE ACCENT +09BC ; InCB; Extend # Mn BENGALI SIGN NUKTA +09FE ; InCB; Extend # Mn BENGALI SANDHI MARK +0A3C ; InCB; Extend # Mn GURMUKHI SIGN NUKTA +0ABC ; InCB; Extend # Mn GUJARATI SIGN NUKTA +0B3C ; InCB; Extend # Mn ORIYA SIGN NUKTA +0C3C ; InCB; Extend # Mn TELUGU SIGN NUKTA +0C55..0C56 ; InCB; Extend # Mn [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK +0CBC ; InCB; Extend # Mn KANNADA SIGN NUKTA +0D3B..0D3C ; InCB; Extend # Mn [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA +0E38..0E3A ; InCB; Extend # Mn [3] THAI CHARACTER SARA U..THAI CHARACTER PHINTHU +0E48..0E4B ; InCB; Extend # Mn [4] THAI CHARACTER MAI EK..THAI CHARACTER MAI CHATTAWA +0EB8..0EBA ; InCB; Extend # Mn [3] LAO VOWEL SIGN U..LAO SIGN PALI VIRAMA +0EC8..0ECB ; InCB; Extend # Mn [4] LAO TONE MAI EK..LAO TONE MAI CATAWA +0F18..0F19 ; InCB; Extend # Mn [2] TIBETAN ASTROLOGICAL SIGN -KHYUD PA..TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS +0F35 ; InCB; Extend # Mn TIBETAN MARK NGAS BZUNG NYI ZLA +0F37 ; InCB; Extend # Mn TIBETAN MARK NGAS BZUNG SGOR RTAGS +0F39 ; InCB; Extend # Mn TIBETAN MARK TSA -PHRU +0F71..0F72 ; InCB; Extend # Mn [2] TIBETAN VOWEL SIGN AA..TIBETAN VOWEL SIGN I +0F74 ; InCB; Extend # Mn TIBETAN VOWEL SIGN U +0F7A..0F7D ; InCB; Extend # Mn [4] TIBETAN VOWEL SIGN E..TIBETAN VOWEL SIGN OO +0F80 ; InCB; Extend # Mn TIBETAN VOWEL SIGN REVERSED I +0F82..0F84 ; InCB; Extend # Mn [3] TIBETAN SIGN NYI ZLA NAA DA..TIBETAN MARK HALANTA +0F86..0F87 ; InCB; Extend # Mn [2] TIBETAN SIGN LCI RTAGS..TIBETAN SIGN YANG RTAGS +0FC6 ; InCB; Extend # Mn TIBETAN SYMBOL PADMA GDAN +1037 ; InCB; Extend # Mn MYANMAR SIGN DOT BELOW +1039..103A ; InCB; Extend # Mn [2] MYANMAR SIGN VIRAMA..MYANMAR SIGN ASAT +108D ; InCB; Extend # Mn MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE +135D..135F ; InCB; Extend # Mn [3] ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK..ETHIOPIC COMBINING GEMINATION MARK +1714 ; InCB; Extend # Mn TAGALOG SIGN VIRAMA +17D2 ; InCB; Extend # Mn KHMER SIGN COENG +17DD ; InCB; Extend # Mn KHMER SIGN ATTHACAN +18A9 ; InCB; Extend # Mn MONGOLIAN LETTER ALI GALI DAGALGA +1939..193B ; InCB; Extend # Mn [3] LIMBU SIGN MUKPHRENG..LIMBU SIGN SA-I +1A17..1A18 ; InCB; Extend # Mn [2] BUGINESE VOWEL SIGN I..BUGINESE VOWEL SIGN U +1A60 ; InCB; Extend # Mn TAI THAM SIGN SAKOT +1A75..1A7C ; InCB; Extend # Mn [8] TAI THAM SIGN TONE-1..TAI THAM SIGN KHUEN-LUE KARAN +1A7F ; InCB; Extend # Mn TAI THAM COMBINING CRYPTOGRAMMIC DOT +1AB0..1ABD ; InCB; Extend # Mn [14] COMBINING DOUBLED CIRCUMFLEX ACCENT..COMBINING PARENTHESES BELOW +1ABF..1ACE ; InCB; Extend # Mn [16] COMBINING LATIN SMALL LETTER W BELOW..COMBINING LATIN SMALL LETTER INSULAR T +1B34 ; InCB; Extend # Mn BALINESE SIGN REREKAN +1B6B..1B73 ; InCB; Extend # Mn [9] BALINESE MUSICAL SYMBOL COMBINING TEGEH..BALINESE MUSICAL SYMBOL COMBINING GONG +1BAB ; InCB; Extend # Mn SUNDANESE SIGN VIRAMA +1BE6 ; InCB; Extend # Mn BATAK SIGN TOMPI +1C37 ; InCB; Extend # Mn LEPCHA SIGN NUKTA +1CD0..1CD2 ; InCB; Extend # Mn [3] VEDIC TONE KARSHANA..VEDIC TONE PRENKHA +1CD4..1CE0 ; InCB; Extend # Mn [13] VEDIC SIGN YAJURVEDIC MIDLINE SVARITA..VEDIC TONE RIGVEDIC KASHMIRI INDEPENDENT SVARITA +1CE2..1CE8 ; InCB; Extend # Mn [7] VEDIC SIGN VISARGA SVARITA..VEDIC SIGN VISARGA ANUDATTA WITH TAIL +1CED ; InCB; Extend # Mn VEDIC SIGN TIRYAK +1CF4 ; InCB; Extend # Mn VEDIC TONE CANDRA ABOVE +1CF8..1CF9 ; InCB; Extend # Mn [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE +1DC0..1DFF ; InCB; Extend # Mn [64] COMBINING DOTTED GRAVE ACCENT..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW +200D ; InCB; Extend # Cf ZERO WIDTH JOINER +20D0..20DC ; InCB; Extend # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE +20E1 ; InCB; Extend # Mn COMBINING LEFT RIGHT ARROW ABOVE +20E5..20F0 ; InCB; Extend # Mn [12] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING ASTERISK ABOVE +2CEF..2CF1 ; InCB; Extend # Mn [3] COPTIC COMBINING NI ABOVE..COPTIC COMBINING SPIRITUS LENIS +2D7F ; InCB; Extend # Mn TIFINAGH CONSONANT JOINER +2DE0..2DFF ; InCB; Extend # Mn [32] COMBINING CYRILLIC LETTER BE..COMBINING CYRILLIC LETTER IOTIFIED BIG YUS +302A..302D ; InCB; Extend # Mn [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC ENTERING TONE MARK +302E..302F ; InCB; Extend # Mc [2] HANGUL SINGLE DOT TONE MARK..HANGUL DOUBLE DOT TONE MARK +3099..309A ; InCB; Extend # Mn [2] COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK..COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK +A66F ; InCB; Extend # Mn COMBINING CYRILLIC VZMET +A674..A67D ; InCB; Extend # Mn [10] COMBINING CYRILLIC LETTER UKRAINIAN IE..COMBINING CYRILLIC PAYEROK +A69E..A69F ; InCB; Extend # Mn [2] COMBINING CYRILLIC LETTER EF..COMBINING CYRILLIC LETTER IOTIFIED E +A6F0..A6F1 ; InCB; Extend # Mn [2] BAMUM COMBINING MARK KOQNDON..BAMUM COMBINING MARK TUKWENTIS +A82C ; InCB; Extend # Mn SYLOTI NAGRI SIGN ALTERNATE HASANTA +A8E0..A8F1 ; InCB; Extend # Mn [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA +A92B..A92D ; InCB; Extend # Mn [3] KAYAH LI TONE PLOPHU..KAYAH LI TONE CALYA PLOPHU +A9B3 ; InCB; Extend # Mn JAVANESE SIGN CECAK TELU +AAB0 ; InCB; Extend # Mn TAI VIET MAI KANG +AAB2..AAB4 ; InCB; Extend # Mn [3] TAI VIET VOWEL I..TAI VIET VOWEL U +AAB7..AAB8 ; InCB; Extend # Mn [2] TAI VIET MAI KHIT..TAI VIET VOWEL IA +AABE..AABF ; InCB; Extend # Mn [2] TAI V