aboutsummaryrefslogtreecommitdiff
path: root/localedata/unicode-gen
diff options
context:
space:
mode:
authorMike FABIAN <mfabian@redhat.com>2023-09-14 18:01:40 +0200
committerMike FABIAN <mfabian@redhat.com>2023-09-16 08:37:03 +0200
commitbb5bbc20702981c287aa3e44640e7d2f2b9a28cf (patch)
tree163813194ca56338327ee48e3a9197f5a6490c00 /localedata/unicode-gen
parent71de3aead9fffe89556e80ebc94aa918d8ee7bca (diff)
downloadglibc-bb5bbc20702981c287aa3e44640e7d2f2b9a28cf.tar.xz
glibc-bb5bbc20702981c287aa3e44640e7d2f2b9a28cf.zip
Update to Unicode 15.1.0 [BZ #30854]
Unicode 15.1.0 Support: Character encoding, character type info, and transliteration tables are all updated to Unicode 15.1.0, using the generator scripts contributed by Mike FABIAN (Red Hat). Total removed characters in newly generated CHARMAP: 0 Total changed characters in newly generated CHARMAP: 0 Total added characters in newly generated CHARMAP: 627 Total removed characters in newly generated WIDTH: 0 Total changed characters in newly generated WIDTH: 0 Total added characters in newly generated WIDTH: 627 alpha: Added 622 characters in new ctype which were not in old ctype graph: Added 627 characters in new ctype which were not in old ctype print: Added 627 characters in new ctype which were not in old ctype punct: Added 5 characters in new ctype which were not in old ctype The five characters added to punct are: 2FFC;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM RIGHT;So;0;ON;;;;;N;;;;; 2FFD;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER RIGHT;So;0;ON;;;;;N;;;;; 2FFE;IDEOGRAPHIC DESCRIPTION CHARACTER HORIZONTAL REFLECTION;So;0;ON;;;;;N;;;;; 2FFF;IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION;So;0;ON;;;;;N;;;;; 31EF;IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION;So;0;ON;;;;;N;;;;; The Unicode announcement blog entry says "[...] adds 627 characters, [...] additions include 622 CJK unified ideographs in a new block, [...]", so that looks OK. The Unicode blog mentions "six completely new emoji" but they don't appear here as they are all sequences and not single code points. Resolves: BZ #30854 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Diffstat (limited to 'localedata/unicode-gen')
-rw-r--r--localedata/unicode-gen/DerivedCoreProperties.txt277
-rw-r--r--localedata/unicode-gen/EastAsianWidth.txt5170
-rw-r--r--localedata/unicode-gen/Makefile2
-rw-r--r--localedata/unicode-gen/PropList.txt78
-rw-r--r--localedata/unicode-gen/UnicodeData.txt7
5 files changed, 2930 insertions, 2604 deletions
diff --git a/localedata/unicode-gen/DerivedCoreProperties.txt b/localedata/unicode-gen/DerivedCoreProperties.txt
index 8b482b5c10..220c55685d 100644
--- a/localedata/unicode-gen/DerivedCoreProperties.txt
+++ b/localedata/unicode-gen/DerivedCoreProperties.txt
@@ -1,6 +1,6 @@
-# DerivedCoreProperties-15.0.0.txt
-# Date: 2022-08-05, 22:17:05 GMT
-# © 2022 Unicode®, Inc.
+# DerivedCoreProperties-15.1.0.txt
+# Date: 2023-08-07, 15:21:24 GMT
+# © 2023 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see https://www.unicode.org/terms_of_use.html
#
@@ -1397,11 +1397,12 @@ FFDA..FFDC ; Alphabetic # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANG
2B740..2B81D ; Alphabetic # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B820..2CEA1 ; Alphabetic # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEB0..2EBE0 ; Alphabetic # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D ; Alphabetic # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
2F800..2FA1D ; Alphabetic # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
30000..3134A ; Alphabetic # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
31350..323AF ; Alphabetic # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
-# Total code points: 137765
+# Total code points: 138387
# ================================================
@@ -6853,11 +6854,12 @@ FFDA..FFDC ; ID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL
2B740..2B81D ; ID_Start # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B820..2CEA1 ; ID_Start # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEB0..2EBE0 ; ID_Start # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D ; ID_Start # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
2F800..2FA1D ; ID_Start # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
30000..3134A ; ID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
31350..323AF ; ID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
-# Total code points: 136345
+# Total code points: 136967
# ================================================
@@ -7438,6 +7440,7 @@ FFDA..FFDC ; ID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL
1FE0..1FEC ; ID_Continue # L& [13] GREEK SMALL LETTER UPSILON WITH VRACHY..GREEK CAPITAL LETTER RHO WITH DASIA
1FF2..1FF4 ; ID_Continue # L& [3] GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI..GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI
1FF6..1FFC ; ID_Continue # L& [7] GREEK SMALL LETTER OMEGA WITH PERISPOMENI..GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI
+200C..200D ; ID_Continue # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
203F..2040 ; ID_Continue # Pc [2] UNDERTIE..CHARACTER TIE
2054 ; ID_Continue # Pc INVERTED UNDERTIE
2071 ; ID_Continue # Lm SUPERSCRIPT LATIN SMALL LETTER I
@@ -7504,6 +7507,7 @@ FFDA..FFDC ; ID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL
309D..309E ; ID_Continue # Lm [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK
309F ; ID_Continue # Lo HIRAGANA DIGRAPH YORI
30A1..30FA ; ID_Continue # Lo [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO
+30FB ; ID_Continue # Po KATAKANA MIDDLE DOT
30FC..30FE ; ID_Continue # Lm [3] KATAKANA-HIRAGANA PROLONGED SOUND MARK..KATAKANA VOICED ITERATION MARK
30FF ; ID_Continue # Lo KATAKANA DIGRAPH KOTO
3105..312F ; ID_Continue # Lo [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN
@@ -7683,6 +7687,7 @@ FF10..FF19 ; ID_Continue # Nd [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NIN
FF21..FF3A ; ID_Continue # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
FF3F ; ID_Continue # Pc FULLWIDTH LOW LINE
FF41..FF5A ; ID_Continue # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
+FF65 ; ID_Continue # Po HALFWIDTH KATAKANA MIDDLE DOT
FF66..FF6F ; ID_Continue # Lo [10] HALFWIDTH KATAKANA LETTER WO..HALFWIDTH KATAKANA LETTER SMALL TU
FF70 ; ID_Continue # Lm HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK
FF71..FF9D ; ID_Continue # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAKANA LETTER N
@@ -8207,12 +8212,13 @@ FFDA..FFDC ; ID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HAN
2B740..2B81D ; ID_Continue # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B820..2CEA1 ; ID_Continue # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEB0..2EBE0 ; ID_Continue # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D ; ID_Continue # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
2F800..2FA1D ; ID_Continue # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
30000..3134A ; ID_Continue # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
31350..323AF ; ID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
E0100..E01EF ; ID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
-# Total code points: 139482
+# Total code points: 140108
# ================================================
@@ -8962,11 +8968,12 @@ FFDA..FFDC ; XID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGU
2B740..2B81D ; XID_Start # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B820..2CEA1 ; XID_Start # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEB0..2EBE0 ; XID_Start # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D ; XID_Start # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
2F800..2FA1D ; XID_Start # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
30000..3134A ; XID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
31350..323AF ; XID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
-# Total code points: 136322
+# Total code points: 136944
# ================================================
@@ -9543,6 +9550,7 @@ FFDA..FFDC ; XID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGU
1FE0..1FEC ; XID_Continue # L& [13] GREEK SMALL LETTER UPSILON WITH VRACHY..GREEK CAPITAL LETTER RHO WITH DASIA
1FF2..1FF4 ; XID_Continue # L& [3] GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI..GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI
1FF6..1FFC ; XID_Continue # L& [7] GREEK SMALL LETTER OMEGA WITH PERISPOMENI..GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI
+200C..200D ; XID_Continue # Cf [2] ZERO WIDTH NON-JOINER..ZERO WIDTH JOINER
203F..2040 ; XID_Continue # Pc [2] UNDERTIE..CHARACTER TIE
2054 ; XID_Continue # Pc INVERTED UNDERTIE
2071 ; XID_Continue # Lm SUPERSCRIPT LATIN SMALL LETTER I
@@ -9608,6 +9616,7 @@ FFDA..FFDC ; XID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGU
309D..309E ; XID_Continue # Lm [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK
309F ; XID_Continue # Lo HIRAGANA DIGRAPH YORI
30A1..30FA ; XID_Continue # Lo [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO
+30FB ; XID_Continue # Po KATAKANA MIDDLE DOT
30FC..30FE ; XID_Continue # Lm [3] KATAKANA-HIRAGANA PROLONGED SOUND MARK..KATAKANA VOICED ITERATION MARK
30FF ; XID_Continue # Lo KATAKANA DIGRAPH KOTO
3105..312F ; XID_Continue # Lo [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN
@@ -9793,6 +9802,7 @@ FF10..FF19 ; XID_Continue # Nd [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NI
FF21..FF3A ; XID_Continue # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z
FF3F ; XID_Continue # Pc FULLWIDTH LOW LINE
FF41..FF5A ; XID_Continue # L& [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z
+FF65 ; XID_Continue # Po HALFWIDTH KATAKANA MIDDLE DOT
FF66..FF6F ; XID_Continue # Lo [10] HALFWIDTH KATAKANA LETTER WO..HALFWIDTH KATAKANA LETTER SMALL TU
FF70 ; XID_Continue # Lm HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK
FF71..FF9D ; XID_Continue # Lo [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAKANA LETTER N
@@ -10317,12 +10327,13 @@ FFDA..FFDC ; XID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HA
2B740..2B81D ; XID_Continue # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B820..2CEA1 ; XID_Continue # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEB0..2EBE0 ; XID_Continue # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D ; XID_Continue # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
2F800..2FA1D ; XID_Continue # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
30000..3134A ; XID_Continue # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
31350..323AF ; XID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
E0100..E01EF ; XID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
-# Total code points: 139463
+# Total code points: 140089
# ================================================
@@ -10335,6 +10346,15 @@ E0100..E01EF ; XID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTO
# - FFF9..FFFB (Interlinear annotation format characters)
# - 13430..13440 (Egyptian hieroglyph format characters)
# - Prepended_Concatenation_Mark (Exceptional format characters that should be visible)
+#
+# There are currently no stability guarantees for DICP. However, the
+# values of DICP interact with the derivation of XID_Continue
+# and NFKC_CF, for which there are stability guarantees.
+# Maintainers of this property should note that in the
+# unlikely case that the DICP value changes for an existing character
+# which is also XID_Continue=Yes, then exceptions must be put
+# in place to ensure that the NFKC_CF mapping value for that
+# existing character does not change.
00AD ; Default_Ignorable_Code_Point # Cf SOFT HYPHEN
034F ; Default_Ignorable_Code_Point # Mn COMBINING GRAPHEME JOINER
@@ -11602,7 +11622,7 @@ E0100..E01EF ; Grapheme_Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELE
2E80..2E99 ; Grapheme_Base # So [26] CJK RADICAL REPEAT..CJK RADICAL RAP
2E9B..2EF3 ; Grapheme_Base # So [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE
2F00..2FD5 ; Grapheme_Base # So [214] KANGXI RADICAL ONE..KANGXI RADICAL FLUTE
-2FF0..2FFB ; Grapheme_Base # So [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
+2FF0..2FFF ; Grapheme_Base # So [16] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION
3000 ; Grapheme_Base # Zs IDEOGRAPHIC SPACE
3001..3003 ; Grapheme_Base # Po [3] IDEOGRAPHIC COMMA..DITTO MARK
3004 ; Grapheme_Base # So JAPANESE INDUSTRIAL STANDARD SYMBOL
@@ -11657,6 +11677,7 @@ E0100..E01EF ; Grapheme_Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELE
3196..319F ; Grapheme_Base # So [10] IDEOGRAPHIC ANNOTATION TOP MARK..IDEOGRAPHIC ANNOTATION MAN MARK
31A0..31BF ; Grapheme_Base # Lo [32] BOPOMOFO LETTER BU..BOPOMOFO LETTER AH
31C0..31E3 ; Grapheme_Base # So [36] CJK STROKE T..CJK STROKE Q
+31EF ; Grapheme_Base # So IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION
31F0..31FF ; Grapheme_Base # Lo [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO
3200..321E ; Grapheme_Base # So [31] PARENTHESIZED HANGUL KIYEOK..PARENTHESIZED KOREAN CHARACTER O HU
3220..3229 ; Grapheme_Base # No [10] PARENTHESIZED IDEOGRAPH ONE..PARENTHESIZED IDEOGRAPH TEN
@@ -12497,11 +12518,12 @@ FFFC..FFFD ; Grapheme_Base # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEME
2B740..2B81D ; Grapheme_Base # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B820..2CEA1 ; Grapheme_Base # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEB0..2EBE0 ; Grapheme_Base # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
+2EBF0..2EE5D ; Grapheme_Base # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
2F800..2FA1D ; Grapheme_Base # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
30000..3134A ; Grapheme_Base # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
31350..323AF ; Grapheme_Base # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
-# Total code points: 146986
+# Total code points: 147613
# ================================================
@@ -12572,4 +12594,239 @@ ABED ; Grapheme_Link # Mn MEETEI MAYEK APUN IYEK
# Total code points: 65
+# ================================================
+
+# Derived Property: Indic_Conjunct_Break
+# Generated from the Grapheme_Cluster_Break, Indic_Syllabic_Category,
+# Canonical_Combining_Class, and Script properties as described in UAX #44:
+# https://www.unicode.org/reports/tr44/.
+
+# All code points not explicitly listed for Indic_Conjunct_Break
+# have the value None.
+
+# @missing: 0000..10FFFF; InCB; None
+
+# ================================================
+
+# Indic_Conjunct_Break=Linker
+
+094D ; InCB; Linker # Mn DEVANAGARI SIGN VIRAMA
+09CD ; InCB; Linker # Mn BENGALI SIGN VIRAMA
+0ACD ; InCB; Linker # Mn GUJARATI SIGN VIRAMA
+0B4D ; InCB; Linker # Mn ORIYA SIGN VIRAMA
+0C4D ; InCB; Linker # Mn TELUGU SIGN VIRAMA
+0D4D ; InCB; Linker # Mn MALAYALAM SIGN VIRAMA
+
+# Total code points: 6
+
+# ================================================
+
+# Indic_Conjunct_Break=Consonant
+
+0915..0939 ; InCB; Consonant # Lo [37] DEVANAGARI LETTER KA..DEVANAGARI LETTER HA
+0958..095F ; InCB; Consonant # Lo [8] DEVANAGARI LETTER QA..DEVANAGARI LETTER YYA
+0978..097F ; InCB; Consonant # Lo [8] DEVANAGARI LETTER MARWARI DDA..DEVANAGARI LETTER BBA
+0995..09A8 ; InCB; Consonant # Lo [20] BENGALI LETTER KA..BENGALI LETTER NA
+09AA..09B0 ; InCB; Consonant # Lo [7] BENGALI LETTER PA..BENGALI LETTER RA
+09B2 ; InCB; Consonant # Lo BENGALI LETTER LA
+09B6..09B9 ; InCB; Consonant # Lo [4] BENGALI LETTER SHA..BENGALI LETTER HA
+09DC..09DD ; InCB; Consonant # Lo [2] BENGALI LETTER RRA..BENGALI LETTER RHA
+09DF ; InCB; Consonant # Lo BENGALI LETTER YYA
+09F0..09F1 ; InCB; Consonant # Lo [2] BENGALI LETTER RA WITH MIDDLE DIAGONAL..BENGALI LETTER RA WITH LOWER DIAGONAL
+0A95..0AA8 ; InCB; Consonant # Lo [20] GUJARATI LETTER KA..GUJARATI LETTER NA
+0AAA..0AB0 ; InCB; Consonant # Lo [7] GUJARATI LETTER PA..GUJARATI LETTER RA
+0AB2..0AB3 ; InCB; Consonant # Lo [2] GUJARATI LETTER LA..GUJARATI LETTER LLA
+0AB5..0AB9 ; InCB; Consonant # Lo [5] GUJARATI LETTER VA..GUJARATI LETTER HA
+0AF9 ; InCB; Consonant # Lo GUJARATI LETTER ZHA
+0B15..0B28 ; InCB; Consonant # Lo [20] ORIYA LETTER KA..ORIYA LETTER NA
+0B2A..0B30 ; InCB; Consonant # Lo [7] ORIYA LETTER PA..ORIYA LETTER RA
+0B32..0B33 ; InCB; Consonant # Lo [2] ORIYA LETTER LA..ORIYA LETTER LLA
+0B35..0B39 ; InCB; Consonant # Lo [5] ORIYA LETTER VA..ORIYA LETTER HA
+0B5C..0B5D ; InCB; Consonant # Lo [2] ORIYA LETTER RRA..ORIYA LETTER RHA
+0B5F ; InCB; Consonant # Lo ORIYA LETTER YYA
+0B71 ; InCB; Consonant # Lo ORIYA LETTER WA
+0C15..0C28 ; InCB; Consonant # Lo [20] TELUGU LETTER KA..TELUGU LETTER NA
+0C2A..0C39 ; InCB; Consonant # Lo [16] TELUGU LETTER PA..TELUGU LETTER HA
+0C58..0C5A ; InCB; Consonant # Lo [3] TELUGU LETTER TSA..TELUGU LETTER RRRA
+0D15..0D3A ; InCB; Consonant # Lo [38] MALAYALAM LETTER KA..MALAYALAM LETTER TTTA
+
+# Total code points: 240
+
+# ================================================
+
+# Indic_Conjunct_Break=Extend
+
+0300..034E ; InCB; Extend # Mn [79] COMBINING GRAVE ACCENT..COMBINING UPWARDS ARROW BELOW
+0350..036F ; InCB; Extend # Mn [32] COMBINING RIGHT ARROWHEAD ABOVE..COMBINING LATIN SMALL LETTER X
+0483..0487 ; InCB; Extend # Mn [5] COMBINING CYRILLIC TITLO..COMBINING CYRILLIC POKRYTIE
+0591..05BD ; InCB; Extend # Mn [45] HEBREW ACCENT ETNAHTA..HEBREW POINT METEG
+05BF ; InCB; Extend # Mn HEBREW POINT RAFE
+05C1..05C2 ; InCB; Extend # Mn [2] HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT
+05C4..05C5 ; InCB; Extend # Mn [2] HEBREW MARK UPPER DOT..HEBREW MARK LOWER DOT
+05C7 ; InCB; Extend # Mn HEBREW POINT QAMATS QATAN
+0610..061A ; InCB; Extend # Mn [11] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL KASRA
+064B..065F ; InCB; Extend # Mn [21] ARABIC FATHATAN..ARABIC WAVY HAMZA BELOW
+0670 ; InCB; Extend # Mn ARABIC LETTER SUPERSCRIPT ALEF
+06D6..06DC ; InCB; Extend # Mn [7] ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA..ARABIC SMALL HIGH SEEN
+06DF..06E4 ; InCB; Extend # Mn [6] ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL HIGH MADDA
+06E7..06E8 ; InCB; Extend # Mn [2] ARABIC SMALL HIGH YEH..ARABIC SMALL HIGH NOON
+06EA..06ED ; InCB; Extend # Mn [4] ARABIC EMPTY CENTRE LOW STOP..ARABIC SMALL LOW MEEM
+0711 ; InCB; Extend # Mn SYRIAC LETTER SUPERSCRIPT ALAPH
+0730..074A ; InCB; Extend # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH
+07EB..07F3 ; InCB; Extend # Mn [9] NKO COMBINING SHORT HIGH TONE..NKO COMBINING DOUBLE DOT ABOVE
+07FD ; InCB; Extend # Mn NKO DANTAYALAN
+0816..0819 ; InCB; Extend # Mn [4] SAMARITAN MARK IN..SAMARITAN MARK DAGESH
+081B..0823 ; InCB; Extend # Mn [9] SAMARITAN MARK EPENTHETIC YUT..SAMARITAN VOWEL SIGN A
+0825..0827 ; InCB; Extend # Mn [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U
+0829..082D ; InCB; Extend # Mn [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA
+0859..085B ; InCB; Extend # Mn [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
+0898..089F ; InCB; Extend # Mn [8] ARABIC SMALL HIGH WORD AL-JUZ..ARABIC HALF MADDA OVER MADDA
+08CA..08E1 ; InCB; Extend # Mn [24] ARABIC SMALL HIGH FARSI YEH..ARABIC SMALL HIGH SIGN SAFHA
+08E3..08FF ; InCB; Extend # Mn [29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS NOON GHUNNA
+093C ; InCB; Extend # Mn DEVANAGARI SIGN NUKTA
+0951..0954 ; InCB; Extend # Mn [4] DEVANAGARI STRESS SIGN UDATTA..DEVANAGARI ACUTE ACCENT
+09BC ; InCB; Extend # Mn BENGALI SIGN NUKTA
+09FE ; InCB; Extend # Mn BENGALI SANDHI MARK
+0A3C ; InCB; Extend # Mn GURMUKHI SIGN NUKTA
+0ABC ; InCB; Extend # Mn GUJARATI SIGN NUKTA
+0B3C ; InCB; Extend # Mn ORIYA SIGN NUKTA
+0C3C ; InCB; Extend # Mn TELUGU SIGN NUKTA
+0C55..0C56 ; InCB; Extend # Mn [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK
+0CBC ; InCB; Extend # Mn KANNADA SIGN NUKTA
+0D3B..0D3C ; InCB; Extend # Mn [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA
+0E38..0E3A ; InCB; Extend # Mn [3] THAI CHARACTER SARA U..THAI CHARACTER PHINTHU
+0E48..0E4B ; InCB; Extend # Mn [4] THAI CHARACTER MAI EK..THAI CHARACTER MAI CHATTAWA
+0EB8..0EBA ; InCB; Extend # Mn [3] LAO VOWEL SIGN U..LAO SIGN PALI VIRAMA
+0EC8..0ECB ; InCB; Extend # Mn [4] LAO TONE MAI EK..LAO TONE MAI CATAWA
+0F18..0F19 ; InCB; Extend # Mn [2] TIBETAN ASTROLOGICAL SIGN -KHYUD PA..TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS
+0F35 ; InCB; Extend # Mn TIBETAN MARK NGAS BZUNG NYI ZLA
+0F37 ; InCB; Extend # Mn TIBETAN MARK NGAS BZUNG SGOR RTAGS
+0F39 ; InCB; Extend # Mn TIBETAN MARK TSA -PHRU
+0F71..0F72 ; InCB; Extend # Mn [2] TIBETAN VOWEL SIGN AA..TIBETAN VOWEL SIGN I
+0F74 ; InCB; Extend # Mn TIBETAN VOWEL SIGN U
+0F7A..0F7D ; InCB; Extend # Mn [4] TIBETAN VOWEL SIGN E..TIBETAN VOWEL SIGN OO
+0F80 ; InCB; Extend # Mn TIBETAN VOWEL SIGN REVERSED I
+0F82..0F84 ; InCB; Extend # Mn [3] TIBETAN SIGN NYI ZLA NAA DA..TIBETAN MARK HALANTA
+0F86..0F87 ; InCB; Extend # Mn [2] TIBETAN SIGN LCI RTAGS..TIBETAN SIGN YANG RTAGS
+0FC6 ; InCB; Extend # Mn TIBETAN SYMBOL PADMA GDAN
+1037 ; InCB; Extend # Mn MYANMAR SIGN DOT BELOW
+1039..103A ; InCB; Extend # Mn [2] MYANMAR SIGN VIRAMA..MYANMAR SIGN ASAT
+108D ; InCB; Extend # Mn MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE
+135D..135F ; InCB; Extend # Mn [3] ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK..ETHIOPIC COMBINING GEMINATION MARK
+1714 ; InCB; Extend # Mn TAGALOG SIGN VIRAMA
+17D2 ; InCB; Extend # Mn KHMER SIGN COENG
+17DD ; InCB; Extend # Mn KHMER SIGN ATTHACAN
+18A9 ; InCB; Extend # Mn MONGOLIAN LETTER ALI GALI DAGALGA
+1939..193B ; InCB; Extend # Mn [3] LIMBU SIGN MUKPHRENG..LIMBU SIGN SA-I
+1A17..1A18 ; InCB; Extend # Mn [2] BUGINESE VOWEL SIGN I..BUGINESE VOWEL SIGN U
+1A60 ; InCB; Extend # Mn TAI THAM SIGN SAKOT
+1A75..1A7C ; InCB; Extend # Mn [8] TAI THAM SIGN TONE-1..TAI THAM SIGN KHUEN-LUE KARAN
+1A7F ; InCB; Extend # Mn TAI THAM COMBINING CRYPTOGRAMMIC DOT
+1AB0..1ABD ; InCB; Extend # Mn [14] COMBINING DOUBLED CIRCUMFLEX ACCENT..COMBINING PARENTHESES BELOW
+1ABF..1ACE ; InCB; Extend # Mn [16] COMBINING LATIN SMALL LETTER W BELOW..COMBINING LATIN SMALL LETTER INSULAR T
+1B34 ; InCB; Extend # Mn BALINESE SIGN REREKAN
+1B6B..1B73 ; InCB; Extend # Mn [9] BALINESE MUSICAL SYMBOL COMBINING TEGEH..BALINESE MUSICAL SYMBOL COMBINING GONG
+1BAB ; InCB; Extend # Mn SUNDANESE SIGN VIRAMA
+1BE6 ; InCB; Extend # Mn BATAK SIGN TOMPI
+1C37 ; InCB; Extend # Mn LEPCHA SIGN NUKTA
+1CD0..1CD2 ; InCB; Extend # Mn [3] VEDIC TONE KARSHANA..VEDIC TONE PRENKHA
+1CD4..1CE0 ; InCB; Extend # Mn [13] VEDIC SIGN YAJURVEDIC MIDLINE SVARITA..VEDIC TONE RIGVEDIC KASHMIRI INDEPENDENT SVARITA
+1CE2..1CE8 ; InCB; Extend # Mn [7] VEDIC SIGN VISARGA SVARITA..VEDIC SIGN VISARGA ANUDATTA WITH TAIL
+1CED ; InCB; Extend # Mn VEDIC SIGN TIRYAK
+1CF4 ; InCB; Extend # Mn VEDIC TONE CANDRA ABOVE
+1CF8..1CF9 ; InCB; Extend # Mn [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE
+1DC0..1DFF ; InCB; Extend # Mn [64] COMBINING DOTTED GRAVE ACCENT..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW
+200D ; InCB; Extend # Cf ZERO WIDTH JOINER
+20D0..20DC ; InCB; Extend # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
+20E1 ; InCB; Extend # Mn COMBINING LEFT RIGHT ARROW ABOVE
+20E5..20F0 ; InCB; Extend # Mn [12] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING ASTERISK ABOVE
+2CEF..2CF1 ; InCB; Extend # Mn [3] COPTIC COMBINING NI ABOVE..COPTIC COMBINING SPIRITUS LENIS
+2D7F ; InCB; Extend # Mn TIFINAGH CONSONANT JOINER
+2DE0..2DFF ; InCB; Extend # Mn [32] COMBINING CYRILLIC LETTER BE..COMBINING CYRILLIC LETTER IOTIFIED BIG YUS
+302A..302D ; InCB; Extend # Mn [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC ENTERING TONE MARK
+302E..302F ; InCB; Extend # Mc [2] HANGUL SINGLE DOT TONE MARK..HANGUL DOUBLE DOT TONE MARK
+3099..309A ; InCB; Extend # Mn [2] COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK..COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
+A66F ; InCB; Extend # Mn COMBINING CYRILLIC VZMET
+A674..A67D ; InCB; Extend # Mn [10] COMBINING CYRILLIC LETTER UKRAINIAN IE..COMBINING CYRILLIC PAYEROK
+A69E..A69F ; InCB; Extend # Mn [2] COMBINING CYRILLIC LETTER EF..COMBINING CYRILLIC LETTER IOTIFIED E
+A6F0..A6F1 ; InCB; Extend # Mn [2] BAMUM COMBINING MARK KOQNDON..BAMUM COMBINING MARK TUKWENTIS
+A82C ; InCB; Extend # Mn SYLOTI NAGRI SIGN ALTERNATE HASANTA
+A8E0..A8F1 ; InCB; Extend # Mn [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA
+A92B..A92D ; InCB; Extend # Mn [3] KAYAH LI TONE PLOPHU..KAYAH LI TONE CALYA PLOPHU
+A9B3 ; InCB; Extend # Mn JAVANESE SIGN CECAK TELU
+AAB0 ; InCB; Extend # Mn TAI VIET MAI KANG
+AAB2..AAB4 ; InCB; Extend # Mn [3] TAI VIET VOWEL I..TAI VIET VOWEL U
+AAB7..AAB8 ; InCB; Extend # Mn [2] TAI VIET MAI KHIT..TAI VIET VOWEL IA
+AABE..AABF ; InCB; Extend # Mn [2] TAI VIET VOWEL AM..TAI VIET TONE MAI EK
+AAC1 ; InCB; Extend # Mn TAI VIET TONE MAI THO
+AAF6 ; InCB; Extend # Mn MEETEI MAYEK VIRAMA
+ABED ; InCB; Extend # Mn MEETEI MAYEK APUN IYEK
+FB1E ; InCB; Extend # Mn HEBREW POINT JUDEO-SPANISH VARIKA
+FE20..FE2F ; InCB; Extend # Mn [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITLO RIGHT HALF
+101FD ; InCB; Extend # Mn PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE
+102E0 ; InCB; Extend # Mn COPTIC EPACT THOUSANDS MARK
+10376..1037A ; InCB; Extend # Mn [5] COMBINING OLD PERMIC LETTER AN..COMBINING OLD PERMIC LETTER SII
+10A0D ; InCB; Extend # Mn KHAROSHTHI SIGN DOUBLE RING BELOW
+10A0F ; InCB; Extend # Mn KHAROSHTHI SIGN VISARGA
+10A38..10A3A ; InCB; Extend # Mn [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW
+10A3F ; InCB; Extend # Mn KHAROSHTHI VIRAMA
+10AE5..10AE6 ; InCB; Extend # Mn [2] MANICHAEAN ABBREVIATION MARK ABOVE..MANICHAEAN ABBREVIATION MARK BELOW
+10D24..10D27 ; InCB; Extend # Mn [4] HANIFI ROHINGYA SIGN HARBAHAY..HANIFI ROHINGYA SIGN TASSI
+10EAB..10EAC ; InCB; Extend # Mn [2] YEZIDI COMBINING HAMZA MARK..YEZIDI COMBINING MADDA MARK
+10EFD..10EFF ; InCB; Extend # Mn [3] ARABIC SMALL LOW WORD SAKTA..ARABIC SMALL LOW WORD MADDA
+10F46..10F50 ; InCB; Extend # Mn [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW
+10F82..10F85 ; InCB; Extend # Mn [4] OLD UYGHUR COMBINING DOT ABOVE..OLD UYGHUR COMBINING TWO DOTS BELOW
+11070 ; InCB; Extend # Mn BRAHMI SIGN OLD TAMIL VIRAMA
+1107F ; InCB; Extend # Mn BRAHMI NUMBER JOINER
+110BA ; InCB; Extend # Mn KAITHI SIGN NUKTA
+11100..11102 ; InCB; Extend # Mn [3] CHAKMA SIGN CANDRABINDU..CHAKMA SIGN VISARGA
+11133..11134 ; InCB; Extend # Mn [2] CHAKMA