Received: (at 20789) by debbugs.gnu.org; 27 Jun 2015 07:43:05 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sat Jun 27 03:43:05 2015 Received: from localhost ([127.0.0.1]:58661 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z8klU-0006X0-8v for submit <at> debbugs.gnu.org; Sat, 27 Jun 2015 03:43:04 -0400 Received: from mtaout20.012.net.il ([80.179.55.166]:45669) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eliz@HIDDEN>) id 1Z8klR-0006WU-CQ for 20789 <at> debbugs.gnu.org; Sat, 27 Jun 2015 03:43:02 -0400 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0NQL00900E286G00@HIDDEN> for 20789 <at> debbugs.gnu.org; Sat, 27 Jun 2015 10:42:54 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NQL008GZERIZ6B0@HIDDEN>; Sat, 27 Jun 2015 10:42:54 +0300 (IDT) Date: Sat, 27 Jun 2015 10:42:51 +0300 From: Eli Zaretskii <eliz@HIDDEN> Subject: Re: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation In-reply-to: <awa8vldi2r.fsf@HIDDEN> X-012-Sender: halo1@HIDDEN To: Glenn Morris <rgm@HIDDEN> Message-id: <83a8vld2bo.fsf@HIDDEN> References: <21zj45kiix.fsf@HIDDEN> <rek2v93mux.fsf@HIDDEN> <83y4jpqqjq.fsf@HIDDEN> <ozy4jkh58w.fsf@HIDDEN> <834mm7ogv3.fsf@HIDDEN> <4cegla7rnj.fsf@HIDDEN> <83eglamha2.fsf@HIDDEN> <6pp4qlzti.fsf@HIDDEN> <83mvzthzsr.fsf@HIDDEN> <awa8vldi2r.fsf@HIDDEN> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20789 Cc: handa@HIDDEN, 20789 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii <eliz@HIDDEN> List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 1.0 (+) > From: Glenn Morris <rgm@HIDDEN> > Cc: Kenichi Handa <handa@HIDDEN>, 20789 <at> debbugs.gnu.org > Date: Fri, 26 Jun 2015 22:02:36 -0400 > > Eli Zaretskii wrote: > > >> The width 2 characters look like they might be the "W" and "F" characters, > > > > Yes. > > > >> but just doing that gives a list that has many differences to the list > >> Emacs uses. > > This is list of "F" and "W" characters, compared to the 11 ranges that > Emacs uses: Looks good to me. The 11 ranges we have now are either identical or more coarse than the list derived from the UCD that you show. > > I don't see any significant differences, except perhaps in unassigned > > codepoints (see paragraph 6.1 of UAX#11 for the treatment of > > unassigned CJK codepoints). > > I don't know if this means that the above needs modifying? I was saying that we need to augment the list with the 5 ranges of unassigned codepoints that belong to the CJK planes, as described in that section of UAX#11. An unassigned codepoint has its 'general-category' property set to 'Cn', and the list of the 5 planes could be in some defconst, because it will probably never change. Thanks.
bug-gnu-emacs@HIDDEN
:bug#20789
; Package emacs
.
Full text available.Received: (at 20789) by debbugs.gnu.org; 27 Jun 2015 02:02:49 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jun 26 22:02:49 2015 Received: from localhost ([127.0.0.1]:58543 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z8fSC-0007An-W1 for submit <at> debbugs.gnu.org; Fri, 26 Jun 2015 22:02:49 -0400 Received: from eggs.gnu.org ([208.118.235.92]:39775) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <rgm@HIDDEN>) id 1Z8fS9-0007Aa-VE for 20789 <at> debbugs.gnu.org; Fri, 26 Jun 2015 22:02:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z8fS3-0000gC-S4 for 20789 <at> debbugs.gnu.org; Fri, 26 Jun 2015 22:02:40 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:39589) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z8fS1-0000fu-OT; Fri, 26 Jun 2015 22:02:37 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@HIDDEN>) id 1Z8fS0-0006eu-NO; Fri, 26 Jun 2015 22:02:36 -0400 From: Glenn Morris <rgm@HIDDEN> To: Eli Zaretskii <eliz@HIDDEN> Subject: Re: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation References: <21zj45kiix.fsf@HIDDEN> <rek2v93mux.fsf@HIDDEN> <83y4jpqqjq.fsf@HIDDEN> <ozy4jkh58w.fsf@HIDDEN> <834mm7ogv3.fsf@HIDDEN> <4cegla7rnj.fsf@HIDDEN> <83eglamha2.fsf@HIDDEN> <6pp4qlzti.fsf@HIDDEN> <83mvzthzsr.fsf@HIDDEN> X-Spook: Mudslide Rootkit Shootout Keylogger Crest nuclear X-Ran: P?;pE30l[EXO8+3^KB$Ymy%9=$:#J%Z}\3G.4eWvcwI$Y?D8ht)Pswpq=3W[NzuoE~!h29 X-Hue: cyan X-Debbugs-No-Ack: yes X-Attribution: GM Date: Fri, 26 Jun 2015 22:02:36 -0400 In-Reply-To: <83mvzthzsr.fsf@HIDDEN> (Eli Zaretskii's message of "Sun, 21 Jun 2015 18:00:20 +0300") Message-ID: <awa8vldi2r.fsf@HIDDEN> User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.4 (------) X-Debbugs-Envelope-To: 20789 Cc: Kenichi Handa <handa@HIDDEN>, 20789 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -6.4 (------) Eli Zaretskii wrote: >> The width 2 characters look like they might be the "W" and "F" characters, > > Yes. > >> but just doing that gives a list that has many differences to the list >> Emacs uses. This is list of "F" and "W" characters, compared to the 11 ranges that Emacs uses: (#x1100 . #x115F) (#x2329 . #x232A) (#x2E80 . #x2E99) (#x2E9B . #x2EF3) (#x2F00 . #x2FD5) (#x2FF0 . #x2FFB) (#x3000 . #x303E) (#x3041 . #x3096) (#x3099 . #x30FF) (#x3105 . #x312D) (#x3131 . #x318E) (#x3190 . #x31BA) (#x31C0 . #x31E3) (#x31F0 . #x321E) (#x3220 . #x3247) (#x3250 . #x32FE) (#x3300 . #x4DBF) (#x4E00 . #xA48C) (#xA490 . #xA4C6) (#xA960 . #xA97C) (#xAC00 . #xD7A3) (#xF900 . #xFAFF) (#xFE10 . #xFE19) (#xFE30 . #xFE52) (#xFE54 . #xFE66) (#xFE68 . #xFE6B) (#xFF01 . #xFF60) (#xFFE0 . #xFFE6) (#x1B000 . #x1B001) (#x1F200 . #x1F202) (#x1F210 . #x1F23A) (#x1F240 . #x1F248) (#x1F250 . #x1F251) (#x20000 . #x2FFFD) (#x30000 . #x3FFFD) > I don't see any significant differences, except perhaps in unassigned > codepoints (see paragraph 6.1 of UAX#11 for the treatment of > unassigned CJK codepoints). I don't know if this means that the above needs modifying?
bug-gnu-emacs@HIDDEN
:bug#20789
; Package emacs
.
Full text available.Glenn Morris <rgm@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Glenn Morris <rgm@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Debbugs Internal Request <help-debbugs@HIDDEN>
to internal_control <at> debbugs.gnu.org
.
Full text available.Received: (at 20789) by debbugs.gnu.org; 21 Jun 2015 15:00:49 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sun Jun 21 11:00:49 2015 Received: from localhost ([127.0.0.1]:53478 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z6gjn-0000Oa-08 for submit <at> debbugs.gnu.org; Sun, 21 Jun 2015 11:00:49 -0400 Received: from mtaout29.012.net.il ([80.179.55.185]:38770) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eliz@HIDDEN>) id 1Z6gji-0000OI-Kw for 20789 <at> debbugs.gnu.org; Sun, 21 Jun 2015 11:00:44 -0400 Received: from conversion-daemon.mtaout29.012.net.il by mtaout29.012.net.il (HyperSendmail v2007.08) id <0NQA00O00URR4200@HIDDEN> for 20789 <at> debbugs.gnu.org; Sun, 21 Jun 2015 18:00:05 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout29.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NQA00H5KV004L80@HIDDEN>; Sun, 21 Jun 2015 18:00:04 +0300 (IDT) Date: Sun, 21 Jun 2015 18:00:20 +0300 From: Eli Zaretskii <eliz@HIDDEN> Subject: Re: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation In-reply-to: <6pp4qlzti.fsf@HIDDEN> X-012-Sender: halo1@HIDDEN To: Glenn Morris <rgm@HIDDEN>, Kenichi Handa <handa@HIDDEN> Message-id: <83mvzthzsr.fsf@HIDDEN> References: <21zj45kiix.fsf@HIDDEN> <rek2v93mux.fsf@HIDDEN> <83y4jpqqjq.fsf@HIDDEN> <ozy4jkh58w.fsf@HIDDEN> <834mm7ogv3.fsf@HIDDEN> <4cegla7rnj.fsf@HIDDEN> <83eglamha2.fsf@HIDDEN> <6pp4qlzti.fsf@HIDDEN> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20789 Cc: 20789 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii <eliz@HIDDEN> List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 1.0 (+) > From: Glenn Morris <rgm@HIDDEN> > Cc: 20789 <at> debbugs.gnu.org > Date: Sat, 20 Jun 2015 19:34:01 -0400 > > I spent some time looking at some of these. > In no case could I see a clear path from the inputs to the outputs. Thanks for looking into this. Let me first make a general comment: we can always convert only certain parts of the setup to an automated procedure, and leave the rest in its present form, more or less. That's especially true where Emacs has specialized needs or defines properties not in Unicode. > > . characters.el: > > > > . The modify-category-entry calls -- they basically can be derived > > from Blocks.txt > > I looked at it briefly. I can see that they are somewhat related, but > not precisely how. Eg: > > Emacs: 2E80:312F and 3190:33FF are "line breakable". > Which means that "Hangul Compatibility Jamo" isn't. I have no idea why. > > Emacs: 3400:4DBF and 4E00:9FAF are "2-byte han". > Which means that "Yijing Hexagram Symbols" aren't. Again, I have no idea why. > > I didn't look any further. When I said "derived from Blocks.txt", I meant the categories that are related to script names, like ASCII, Latin, Arabic, Chinese, etc. Sorry for not saying that explicitly. Other categories need other sources. Here's my attempt to decipher some of them: . ?| -- "line breakable" The data seems to be in LineBreak.txt, described in detail in UAX#14 (http://unicode.org/reports/tr14/). It looks like characters with the ?| category are those whose line-break properties are ID or CJ or NS. Therefore, the exclusion of Hangul Compatibility Jamo is a mistake (or maybe an omission, since the comment says "Chinese"); in particular, UAX#14 explicitly says, in section 5.1 under "ID", that the characters in the range 3130..318F are treated as class ID. This category is currently used only by kinsoku.el, which has its own data (and sets the ?< and ?> categories). So this will only become important if we ever implement in Emacs something more general, like the algorithm described in UAX#14. . "2-byte han" -- I think this is related to their legacy encoding; I don't see this used anywhere. Likewise with other 2-byte categories. Perhaps Handa-san (CC'ed) could comment on their necessity. If this is still needed, we should probably leave these alone. . ?0 - ?9 -- I don't see how to get this data from the UCD or any other source. Some of it seems to be in IndicSyllabicCategory.txt, FWIW. . ?R and ?L -- already set up using the Unicode data, so no change is needed. . ?^ -- should be set for any character whose general-category is Mn. Since we already do this, the manual setting around line 820 is redundant and should be deleted. . ?. -- already set using Unicode data, no change needed. > > . The setup of char-width-table -- I think the information is in > > EastAsianWidth.txt, with background information described in > > UAX#11 (http://www.unicode.org/reports/tr11/) > > Looks somewhat promising, but could you be more specific? > There's nothing in that file that defines "zero width" characters, so I > don't see where Emacs's width 0 characters come from. The following rules regarding zero-width characters are due to Markus Kuhn, and are excerpted from the description in comments to his implementation of 'wcwidth' (http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c): . The null character (U+0000) has a column width of 0. . Non-spacing and enclosing combining characters (general category code Mn or Me in the Unicode database) have a column width of 0. . ZERO WIDTH SPACE (U+200B) and format characters (general category code Cf in the Unicode database), except SOFT HYPHEN (U+00AD), have a column width of 0. . Hangul Jamo medial vowels and final consonants (U+1160-U+11FF) have a column width of 0. > The width 2 characters look like they might be the "W" and "F" characters, Yes. > but just doing that gives a list that has many differences to the list > Emacs uses. I don't see any significant differences, except perhaps in unassigned codepoints (see paragraph 6.1 of UAX#11 for the treatment of unassigned CJK codepoints). I think any differences beyond that should be treated as errors in Emacs in this case. > > . The setup of char-acronym-table: at least some of the data is in > > NameAliases.txt and NameList.txt > > Looks somewhat promising. > I can see how most of this comes from NameAliases.txt. > But there are many oddities: > > Why does Emacs not have anything for 0009 (HT or TAB) or 000A (LF, NL, > or EOF)? This table is set for the 'acronym' method of glyphless-char-display, so I guess these omissions are for characters for which no one envisioned them to be ever displayed as glyphless. I'd include them in the table anyway, just in case, and also to keep our exceptions vs the UCD to the bare minimum. > 0019 is EOM in the source but EM in Emacs. Typo, I think. > 0080 is PAD in the source but XXX in Emacs. > 0081 is HOP in the source but XXX in Emacs. > 008F is SS3 in the source but SS1 in Emacs. > 0099 is SGC in the source but XXX in Emacs. I think these are typos and perhaps acronyms that whoever wrote this didn't know. > How does Emacs choose which entries to list? There are many more in the > source. Could it do any harm to add more? As long as you take only "abbreviations", i.e. they are short, I think we should use all of them, given their use in Emacs. > Where does "KIVAQ" come from? That appears nowhere in the source AFAICS. AFAIK, that's the official name of that character. At least that's what I glean from Google; I know nothing about the Khmer script. > Why does Emacs list two Khmer entries, and nothing else? There are loads > more of them. These are the only 2 that have such abbreviations; see https://en.wikipedia.org/wiki/Khmer_alphabet (assuming by "loads more" you meant the Khmer letters). > > . fontset.el: > > > > . The setup of script-representative-chars > > I don't see how. It seems to be "for some of, but not all, the entries > in char-script-table, choose a single character somewhere in the range." We should have a representative character for each entry in char-script-table. They are used with some font back-ends (xfont and x?ftfont, AFAIR) to probe candidate fonts for coverage of the required script, so we should have the full information about that. I think the only reason for the partial information we have now is that it is maintained manually, so it includes whatever the people who worked on that bothered to add. > There seems to be no pattern to how the character is chosen within the > range. Often the first one, but by no means always. I think the rule is to choose the first one that is a letter, i.e. its general-category is either one of Lu, Ll, Lt, Lo, or Lm. > > . mule-cmds.el: > > > > . The setting of locale-language-names -- the data is available in > > IANA's Language Subtag Registry > > > > (http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) > > and in ISO 639-2 (http://www.loc.gov/standards/iso639-2/, > > http://www.loc.gov/standards/iso639-2/php/English_list.php) > > Again, I don't see how. Eg nowhere in those source files do I see Welsh > associated with iso-8859-14, and the comment in mule-cmds says that the > last part is "implementation dependent". The bulk of the data is the correspondence between the ISO 639 2-letter names and the country/culture name. The few cases where we also have the encoding could be set up with a very small database once the main data is set, by adding the encoding to those few that need it. If by "last part" you mean IPA and "Nonstandard or obsolete language codes", then these are very few and can be added manually. > > P.S. It would be good to add to somewhere (admin/make-tarball.txt?) a > > reminder to fetch all those reference files and regenerate their > > dependencies, before we prepare a release. > > admin/FOR-RELEASE contains that kind of thing. Right, I will add the information there. Thanks.
bug-gnu-emacs@HIDDEN
:bug#20789
; Package emacs
.
Full text available.Received: (at 20789) by debbugs.gnu.org; 20 Jun 2015 23:34:13 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sat Jun 20 19:34:13 2015 Received: from localhost ([127.0.0.1]:52987 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z6SH6-0001Hh-Cr for submit <at> debbugs.gnu.org; Sat, 20 Jun 2015 19:34:12 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52367) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <rgm@HIDDEN>) id 1Z6SH3-0001HT-7C for 20789 <at> debbugs.gnu.org; Sat, 20 Jun 2015 19:34:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z6SGw-0001WA-Et for 20789 <at> debbugs.gnu.org; Sat, 20 Jun 2015 19:34:03 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=ALL_TRUSTED,BAYES_50, RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([208.118.235.10]:35529) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z6SGw-0001Vs-B8 for 20789 <at> debbugs.gnu.org; Sat, 20 Jun 2015 19:34:02 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@HIDDEN>) id 1Z6SGv-0003ww-Q3; Sat, 20 Jun 2015 19:34:01 -0400 From: Glenn Morris <rgm@HIDDEN> To: Eli Zaretskii <eliz@HIDDEN> Subject: Re: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation References: <21zj45kiix.fsf@HIDDEN> <rek2v93mux.fsf@HIDDEN> <83y4jpqqjq.fsf@HIDDEN> <ozy4jkh58w.fsf@HIDDEN> <834mm7ogv3.fsf@HIDDEN> <4cegla7rnj.fsf@HIDDEN> <83eglamha2.fsf@HIDDEN> X-Spook: CIDA Gazprom Border Patrol Tony Blair Dock Soviet X-Ran: o$"Z!Xw_D#rY2GFBBl*#nhsZ-h;9("_4+#Sr`-Z=Y89d?&:{A%~tpvaIBmzGF=L4N]-b{n X-Hue: yellow X-Debbugs-No-Ack: yes X-Attribution: GM Date: Sat, 20 Jun 2015 19:34:01 -0400 Message-ID: <6pp4qlzti.fsf@HIDDEN> User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.10 X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 20789 Cc: 20789 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.6 (-----) I spent some time looking at some of these. In no case could I see a clear path from the inputs to the outputs. Eli Zaretskii wrote: > . characters.el: > > . The modify-category-entry calls -- they basically can be derived > from Blocks.txt I looked at it briefly. I can see that they are somewhat related, but not precisely how. Eg: Emacs: 2E80:312F and 3190:33FF are "line breakable". Which means that "Hangul Compatibility Jamo" isn't. I have no idea why. Emacs: 3400:4DBF and 4E00:9FAF are "2-byte han". Which means that "Yijing Hexagram Symbols" aren't. Again, I have no idea why. I didn't look any further. > . The modify-syntax-entry and set-case-syntax calls can be derived > from the values of the 'general-category' property returned by > 'get-char-code-property', perhaps augmented by 'paired-bracket' > and 'paired-type' properties I didn't look at this yet. > . The set-case-syntax-pair calls (perhaps use the data in > CaseFolding.txt, or even the case mapping information in > UnicodeData.txt) I didn't look at this yet. > . The setup of char-width-table -- I think the information is in > EastAsianWidth.txt, with background information described in > UAX#11 (http://www.unicode.org/reports/tr11/) Looks somewhat promising, but could you be more specific? There's nothing in that file that defines "zero width" characters, so I don't see where Emacs's width 0 characters come from. The width 2 characters look like they might be the "W" and "F" characters, but just doing that gives a list that has many differences to the list Emacs uses. > . The setup of char-acronym-table: at least some of the data is in > NameAliases.txt and NameList.txt Looks somewhat promising. I can see how most of this comes from NameAliases.txt. But there are many oddities: Why does Emacs not have anything for 0009 (HT or TAB) or 000A (LF, NL, or EOF)? 0019 is EOM in the source but EM in Emacs. 0080 is PAD in the source but XXX in Emacs. 0081 is HOP in the source but XXX in Emacs. 008F is SS3 in the source but SS1 in Emacs. 0099 is SGC in the source but XXX in Emacs. How does Emacs choose which entries to list? There are many more in the source. Could it do any harm to add more? Where does "KIVAQ" come from? That appears nowhere in the source AFAICS. Why does Emacs list two Khmer entries, and nothing else? There are loads more of them. > . fontset.el: > > . The setup of script-representative-chars I don't see how. It seems to be "for some of, but not all, the entries in char-script-table, choose a single character somewhere in the range." There seems to be no pattern to how the character is chosen within the range. Often the first one, but by no means always. > . mule-cmds.el: > > . The setting of locale-language-names -- the data is available in > IANA's Language Subtag Registry > (http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) > and in ISO 639-2 (http://www.loc.gov/standards/iso639-2/, > http://www.loc.gov/standards/iso639-2/php/English_list.php) Again, I don't see how. Eg nowhere in those source files do I see Welsh associated with iso-8859-14, and the comment in mule-cmds says that the last part is "implementation dependent". > P.S. It would be good to add to somewhere (admin/make-tarball.txt?) a > reminder to fetch all those reference files and regenerate their > dependencies, before we prepare a release. admin/FOR-RELEASE contains that kind of thing.
bug-gnu-emacs@HIDDEN
:bug#20789
; Package emacs
.
Full text available.Received: (at 20789) by debbugs.gnu.org; 17 Jun 2015 16:49:43 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jun 17 12:49:43 2015 Received: from localhost ([127.0.0.1]:49783 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z5GX0-0001UD-2y for submit <at> debbugs.gnu.org; Wed, 17 Jun 2015 12:49:42 -0400 Received: from mtaout20.012.net.il ([80.179.55.166]:43756) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eliz@HIDDEN>) id 1Z5GC5-0000wS-NC for 20789 <at> debbugs.gnu.org; Wed, 17 Jun 2015 12:28:07 -0400 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0NQ300B00K6U8800@HIDDEN> for 20789 <at> debbugs.gnu.org; Wed, 17 Jun 2015 19:27:59 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NQ300BK6KEM4440@HIDDEN>; Wed, 17 Jun 2015 19:27:59 +0300 (IDT) Date: Wed, 17 Jun 2015 19:27:49 +0300 From: Eli Zaretskii <eliz@HIDDEN> Subject: Re: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation In-reply-to: <4cegla7rnj.fsf@HIDDEN> X-012-Sender: halo1@HIDDEN To: Glenn Morris <rgm@HIDDEN> Message-id: <83eglamha2.fsf@HIDDEN> References: <21zj45kiix.fsf@HIDDEN> <rek2v93mux.fsf@HIDDEN> <83y4jpqqjq.fsf@HIDDEN> <ozy4jkh58w.fsf@HIDDEN> <834mm7ogv3.fsf@HIDDEN> <4cegla7rnj.fsf@HIDDEN> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20789 Cc: 20789 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii <eliz@HIDDEN> List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 1.0 (+) > From: Glenn Morris <rgm@HIDDEN> > Cc: 20789 <at> debbugs.gnu.org > Date: Wed, 17 Jun 2015 02:52:48 -0400 > > Is there anything else in international/ that could benefit from being > auto-generated? Some. Things I've spotted: . characters.el: . The modify-category-entry calls -- they basically can be derived from Blocks.txt . The modify-syntax-entry and set-case-syntax calls can be derived from the values of the 'general-category' property returned by 'get-char-code-property', perhaps augmented by 'paired-bracket' and 'paired-type' properties . The set-case-syntax-pair calls (perhaps use the data in CaseFolding.txt, or even the case mapping information in UnicodeData.txt) . The setup of char-width-table -- I think the information is in EastAsianWidth.txt, with background information described in UAX#11 (http://www.unicode.org/reports/tr11/) . The setup of char-acronym-table: at least some of the data is in NameAliases.txt and NameList.txt . fontset.el: . The setup of script-representative-chars . mule-cmds.el: . The setting of locale-language-names -- the data is available in IANA's Language Subtag Registry (http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) and in ISO 639-2 (http://www.loc.gov/standards/iso639-2/, http://www.loc.gov/standards/iso639-2/php/English_list.php) TIA P.S. It would be good to add to somewhere (admin/make-tarball.txt?) a reminder to fetch all those reference files and regenerate their dependencies, before we prepare a release.
bug-gnu-emacs@HIDDEN
:bug#20789
; Package emacs
.
Full text available.Received: (at 20789) by debbugs.gnu.org; 17 Jun 2015 06:53:00 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jun 17 02:53:00 2015 Received: from localhost ([127.0.0.1]:56707 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z57DY-0003Bp-4N for submit <at> debbugs.gnu.org; Wed, 17 Jun 2015 02:53:00 -0400 Received: from eggs.gnu.org ([208.118.235.92]:43361) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <rgm@HIDDEN>) id 1Z57DV-0003Bb-Ir for 20789 <at> debbugs.gnu.org; Wed, 17 Jun 2015 02:52:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z57DP-0008GX-I2 for 20789 <at> debbugs.gnu.org; Wed, 17 Jun 2015 02:52:52 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:36245) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z57DP-0008GT-Fj for 20789 <at> debbugs.gnu.org; Wed, 17 Jun 2015 02:52:51 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@HIDDEN>) id 1Z57DM-00010o-M8; Wed, 17 Jun 2015 02:52:48 -0400 From: Glenn Morris <rgm@HIDDEN> To: Eli Zaretskii <eliz@HIDDEN> Subject: Re: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation References: <21zj45kiix.fsf@HIDDEN> <rek2v93mux.fsf@HIDDEN> <83y4jpqqjq.fsf@HIDDEN> <ozy4jkh58w.fsf@HIDDEN> <834mm7ogv3.fsf@HIDDEN> X-Spook: Environmental terrorist Human to Animal Nerve agent X-Ran: 8UYm$NVuSa.3ws,WkdUTE##`Wm`Sz|`@R0Pjj@*'`(^sed+uKwn.S)z5Q*I,G(ae%rGO+` X-Hue: red X-Debbugs-No-Ack: yes X-Attribution: GM Date: Wed, 17 Jun 2015 02:52:48 -0400 Message-ID: <4cegla7rnj.fsf@HIDDEN> User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 20789 Cc: 20789 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.6 (-----) Eli Zaretskii wrote: > Well, "signwriting" is not a word, AFAIK, it's 2 words [...] It's a word (in the OED), but in the sense of painting commercial signs. I don't really care, it's just that ~ 50% of the script is transforming the Unicode names to the (seemingly randomly chosen) Emacs names. If the latter were more straightforwardly derived from the former, things would be simpler. But one more special rule makes no difference. > P.S. Does the script work with mawk? Yes, and with Sun OS 5.10's /usr/xpg4/bin/awk (but not /usr/bin/awk). I don't believe it uses any more features than admin/charsets/*.awk. Is there anything else in international/ that could benefit from being auto-generated?
bug-gnu-emacs@HIDDEN
:bug#20789
; Package emacs
.
Full text available.Received: (at 20789) by debbugs.gnu.org; 16 Jun 2015 14:42:05 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 16 10:42:05 2015 Received: from localhost ([127.0.0.1]:55937 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z4s3t-0002jd-KB for submit <at> debbugs.gnu.org; Tue, 16 Jun 2015 10:42:05 -0400 Received: from mtaout23.012.net.il ([80.179.55.175]:39146) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eliz@HIDDEN>) id 1Z4s3n-0002jJ-8e for 20789 <at> debbugs.gnu.org; Tue, 16 Jun 2015 10:41:59 -0400 Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0NQ100I00KOMV300@HIDDEN> for 20789 <at> debbugs.gnu.org; Tue, 16 Jun 2015 17:41:48 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NQ100IM9KTOTR20@HIDDEN>; Tue, 16 Jun 2015 17:41:48 +0300 (IDT) Date: Tue, 16 Jun 2015 17:41:36 +0300 From: Eli Zaretskii <eliz@HIDDEN> Subject: Re: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation In-reply-to: <ozy4jkh58w.fsf@HIDDEN> X-012-Sender: halo1@HIDDEN To: Glenn Morris <rgm@HIDDEN> Message-id: <834mm7ogv3.fsf@HIDDEN> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <21zj45kiix.fsf@HIDDEN> <rek2v93mux.fsf@HIDDEN> <83y4jpqqjq.fsf@HIDDEN> <ozy4jkh58w.fsf@HIDDEN> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20789 Cc: 20789 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii <eliz@HIDDEN> List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 1.0 (+) > From: Glenn Morris <rgm@HIDDEN> > Cc: 20789 <at> debbugs.gnu.org > Date: Mon, 15 Jun 2015 20:22:07 -0400 > > Eli Zaretskii wrote: > > >> I don't suppose that big list can be auto-generated from the inputs? > > > > It's not trivial. I describe below some of the issues, in the hope > > that Someoneā¢ will volunteer: > > Thanks. Script that processes Blocks.txt attached. Some questions: > > 1. In Blocks.txt: > > FF00..FFEF; Halfwidth and Fullwidth Forms > > In Emacs: > > (#xFF00 #xFF5F cjk-misc) > (#xFF61 #xFF9F kana) > (#xFFE0 #xFFEF cjk-misc) > > Is ff60 (FULLWIDTH RIGHT WHITE PARENTHESIS) intentionally omitted? AFAICT, there's a small mess around there. Based on the names of the pertinent characters, I think we should have this instead of the above 3 ranges: (#xFF00 #xFF60 cjk-misc) (#xFF61 #xFF9F kana) (#xFFA0 #xFFDF hangul) (#xFFE0 #xFFEF cjk-misc) > 2. In Emacs "olt-italic" looks like a typo ("old-italic"). Can it be renamed? Yes, please. > 3. In Blocks.txt, Anatolian Hieroglyphs ends at 1467F. > In Emacs, it ends at 1457F. Typo? Yes. > 4. In Blocks.txt: > > 20000..2A6DF; CJK Unified Ideographs Extension B > 2A700..2B73F; CJK Unified Ideographs Extension C > 2B740..2B81F; CJK Unified Ideographs Extension D > 2B820..2CEAF; CJK Unified Ideographs Extension E > 2F800..2FA1F; CJK Compatibility Ideographs Supplement > > In Emacs: > > (#x20000 #x2CEAF han) > (#x2F800 #x2FFFF han) > > Emacs adds the ranges 2a6e0:2a6ff and 2fa20:2ffff, which Blocks.txt does > not cover. Intentional? I don't know, but probably not intentional. I think we had better made it consistent with the UCD. > 5. Newly added "sutton-sign-writing" - should be "sutton-signwriting"? > (The case-insensitive source says "Sutton SignWriting".) Well, "signwriting" is not a word, AFAIK, it's 2 words (and the funny camel-case seems to agree with me). AFAIU, they used "SignWriting" because it's the commercial name. But if you insist, I won't... Thank you for doing this. P.S. Does the script work with mawk? (Some systems have it as their default Awk, I think.)
bug-gnu-emacs@HIDDEN
:bug#20789
; Package emacs
.
Full text available.Received: (at 20789) by debbugs.gnu.org; 16 Jun 2015 00:22:18 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jun 15 20:22:18 2015 Received: from localhost ([127.0.0.1]:55052 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z4edt-0003OK-PO for submit <at> debbugs.gnu.org; Mon, 15 Jun 2015 20:22:18 -0400 Received: from eggs.gnu.org ([208.118.235.92]:54862) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <rgm@HIDDEN>) id 1Z4edr-0003O6-I5 for 20789 <at> debbugs.gnu.org; Mon, 15 Jun 2015 20:22:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z4edl-00054k-Co for 20789 <at> debbugs.gnu.org; Mon, 15 Jun 2015 20:22:10 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=ALL_TRUSTED,BAYES_50, RP_MATCHES_RCVD,UNRESOLVED_TEMPLATE autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([208.118.235.10]:36717) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z4edl-00054g-8z for 20789 <at> debbugs.gnu.org; Mon, 15 Jun 2015 20:22:09 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@HIDDEN>) id 1Z4edk-0004sU-53; Mon, 15 Jun 2015 20:22:08 -0400 From: Glenn Morris <rgm@HIDDEN> To: Eli Zaretskii <eliz@HIDDEN> Subject: Re: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation References: <21zj45kiix.fsf@HIDDEN> <rek2v93mux.fsf@HIDDEN> <83y4jpqqjq.fsf@HIDDEN> X-Spook: terrorism UOP Cloud fraud PLF National Operations Center X-Ran: gTwYKK@H;z1<|;%LOYYgv'7Bt[;$y/iJM{Yv$#+/i{-2<0nEG\A"0BoelWd:lyK[e;2vye X-Hue: cyan X-Debbugs-No-Ack: yes X-Attribution: GM Date: Mon, 15 Jun 2015 20:22:07 -0400 Message-ID: <ozy4jkh58w.fsf@HIDDEN> User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.10 X-Spam-Score: -4.7 (----) X-Debbugs-Envelope-To: 20789 Cc: 20789 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -4.7 (----) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Eli Zaretskii wrote: >> I don't suppose that big list can be auto-generated from the inputs? > > It's not trivial. I describe below some of the issues, in the hope > that Someone=E2=84=A2 will volunteer: Thanks. Script that processes Blocks.txt attached. Some questions: 1. In Blocks.txt: FF00..FFEF; Halfwidth and Fullwidth Forms In Emacs: (#xFF00 #xFF5F cjk-misc) (#xFF61 #xFF9F kana) (#xFFE0 #xFFEF cjk-misc) Is ff60 (FULLWIDTH RIGHT WHITE PARENTHESIS) intentionally omitted? 2. In Emacs "olt-italic" looks like a typo ("old-italic"). Can it be rename= d? 3. In Blocks.txt, Anatolian Hieroglyphs ends at 1467F. In Emacs, it ends at 1457F. Typo? 4. In Blocks.txt: 20000..2A6DF; CJK Unified Ideographs Extension B 2A700..2B73F; CJK Unified Ideographs Extension C 2B740..2B81F; CJK Unified Ideographs Extension D 2B820..2CEAF; CJK Unified Ideographs Extension E 2F800..2FA1F; CJK Compatibility Ideographs Supplement In Emacs: (#x20000 #x2CEAF han) (#x2F800 #x2FFFF han) Emacs adds the ranges 2a6e0:2a6ff and 2fa20:2ffff, which Blocks.txt does not cover. Intentional? 5. Newly added "sutton-sign-writing" - should be "sutton-signwriting"? (The case-insensitive source says "Sutton SignWriting".) --=-=-= Content-Type: application/octet-stream Content-Disposition: attachment; filename=blocks.awk Content-Transfer-Encoding: base64 IyEvdXNyL2Jpbi9hd2sgLWYKCiMjIENvcHlyaWdodCAoQykgMjAxNSBGcmVlIFNvZnR3YXJlIEZv dW5kYXRpb24sIEluYy4KCiMjIEF1dGhvcjogR2xlbm4gTW9ycmlzIDxyZ21AZ251Lm9yZz4KCiMj IFRoaXMgZmlsZSBpcyBwYXJ0IG9mIEdOVSBFbWFjcy4KCiMjIEdOVSBFbWFjcyBpcyBmcmVlIHNv ZnR3YXJlOiB5b3UgY2FuIHJlZGlzdHJpYnV0ZSBpdCBhbmQvb3IgbW9kaWZ5CiMjIGl0IHVuZGVy IHRoZSB0ZXJtcyBvZiB0aGUgR05VIEdlbmVyYWwgUHVibGljIExpY2Vuc2UgYXMgcHVibGlzaGVk IGJ5CiMjIHRoZSBGcmVlIFNvZnR3YXJlIEZvdW5kYXRpb24sIGVpdGhlciB2ZXJzaW9uIDMgb2Yg dGhlIExpY2Vuc2UsIG9yCiMjIChhdCB5b3VyIG9wdGlvbikgYW55IGxhdGVyIHZlcnNpb24uCgoj IyBHTlUgRW1hY3MgaXMgZGlzdHJpYnV0ZWQgaW4gdGhlIGhvcGUgdGhhdCBpdCB3aWxsIGJlIHVz ZWZ1bCwKIyMgYnV0IFdJVEhPVVQgQU5ZIFdBUlJBTlRZOyB3aXRob3V0IGV2ZW4gdGhlIGltcGxp ZWQgd2FycmFudHkgb2YKIyMgTUVSQ0hBTlRBQklMSVRZIG9yIEZJVE5FU1MgRk9SIEEgUEFSVElD VUxBUiBQVVJQT1NFLiAgU2VlIHRoZQojIyBHTlUgR2VuZXJhbCBQdWJsaWMgTGljZW5zZSBmb3Ig bW9yZSBkZXRhaWxzLgoKIyMgWW91IHNob3VsZCBoYXZlIHJlY2VpdmVkIGEgY29weSBvZiB0aGUg R05VIEdlbmVyYWwgUHVibGljIExpY2Vuc2UKIyMgYWxvbmcgd2l0aCBHTlUgRW1hY3MuICBJZiBu b3QsIHNlZSA8aHR0cDovL3d3dy5nbnUub3JnL2xpY2Vuc2VzLz4uCgojIyMgQ29tbWVudGFyeToK CiMjIFRoaXMgc2NyaXB0IHRha2VzIGFzIGlucHV0IFVuaWNvZGUncyBCbG9ja3MudHh0CiMjICho dHRwOi8vd3d3LnVuaWNvZGUub3JnL1B1YmxpYy9VTklEQVRBL0Jsb2Nrcy50eHQpCiMjIGFuZCBw cm9kdWNlcyBvdXRwdXQgZm9yIEVtYWNzJ3MgbGlzcC9pbnRlcm5hdGlvbmFsL2NoYXJzY3JpcHQu ZWwuCgojIyBJdCBsdW1wcyB0b2dldGhlciBhbGwgdGhlIGJsb2NrcyBiZWxvbmdpbmcgdG8gdGhl IHNhbWUgbGFuZ3VhZ2UuCiMjIEUuZy4sICJCYXNpYyBMYXRpbiIsICJMYXRpbi0xIFN1cHBsZW1l bnQiLCAiTGF0aW4gRXh0ZW5kZWQtQSIsCiMjIGV0Yy4gYXJlIGFsbCBsdW1wZWQgdG9nZXRoZXIg dW5kZXIgImxhdGluIi4KCiMjIFRoZSBVbmljb2RlIGJsb2NrcyBhY3R1YWxseSBleHRlbmQgcGFz dCBzb21lIG9mIHRoZXNlIHJhbmdlcyB3aXRoCiMjIHVuZGVmaW5lZCBjb2RlcG9pbnRzLgoKIyMg Rm9yIGFkZGl0aW9uYWwgZGV0YWlscywgc2VlIDxodHRwOi8vZGViYnVncy5nbnUub3JnLzIwNzg5 IzExPi4KCiMjIyBDb2RlOgoKQkVHSU4gewogICAgIyMgSGFyZC1jb2RlZCBuYW1lcy4gIFNlZSBu YW1lMmFsaWFzIGZvciB0aGUgcmVzdC4KICAgIGFsaWFzWyJpcGEgZXh0ZW5zaW9ucyJdID0gInBo b25ldGljIgogICAgYWxpYXNbImxldHRlcmxpa2Ugc3ltYm9scyJdID0gInN5bWJvbCIKICAgIGFs aWFzWyJudW1iZXIgZm9ybXMiXSA9ICJzeW1ib2wiCiAgICBhbGlhc1sibWlzY2VsbGFuZW91cyB0 ZWNobmljYWwiXSA9ICJzeW1ib2wiCiAgICBhbGlhc1siY29udHJvbCBwaWN0dXJlcyJdID0gInN5 bWJvbCIKICAgIGFsaWFzWyJvcHRpY2FsIGNoYXJhY3RlciByZWNvZ25pdGlvbiJdID0gInN5bWJv bCIKICAgIGFsaWFzWyJlbmNsb3NlZCBhbHBoYW51bWVyaWNzIl0gPSAic3ltYm9sIgogICAgYWxp YXNbImJveCBkcmF3aW5nIl0gPSAic3ltYm9sIgogICAgYWxpYXNbImJsb2NrIGVsZW1lbnRzIl0g PSAic3ltYm9sIgogICAgYWxpYXNbIm1pc2NlbGxhbmVvdXMgc3ltYm9scyJdID0gInN5bWJvbCIK ICAgIGFsaWFzWyJjamsgc3Ryb2tlcyJdID0gImNqay1taXNjIgogICAgYWxpYXNbImNqayBzeW1i b2xzIGFuZCBwdW5jdHVhdGlvbiJdID0gImNqay1taXNjIgogICAgYWxpYXNbImhhbGZ3aWR0aCBh bmQgZnVsbHdpZHRoIGZvcm1zIl0gPSAiY2prLW1pc2MiCiAgICBhbGlhc1siY29tbW9uIGluZGlj IG51bWJlciBmb3JtcyJdID0gIm5vcnRoLWluZGljLW51bWJlciIKCiAgICB0b2hleFsiYSJdID0g MTAKICAgIHRvaGV4WyJiIl0gPSAxMQogICAgdG9oZXhbImMiXSA9IDEyCiAgICB0b2hleFsiZCJd ID0gMTMKICAgIHRvaGV4WyJlIl0gPSAxNAogICAgdG9oZXhbImYiXSA9IDE1CgogICAgZml4X3N0 YXJ0WyIwMDgwIl0gPSAiMDBBMCIKICAgIGZpeF9lbmRbIjJBNkRGIl0gPSAiMkE2RkYiCiAgICBm aXhfZW5kWyIyRkExRiJdID0gIjJGRkZGIgp9CgojIyBGcm9tIGFkbWluL2NoYXJzZXRzLy4KIyMg V2l0aCBnYXdrJ3MgLS1ub24tZGVjaW1hbC1kYXRhIHN3aXRjaCB3ZSB3b3VsZG4ndCBuZWVkIHRo aXMuCmZ1bmN0aW9uIGRlY29kZV9oZXgoc3RyICAgLCBuLCBsZW4sIGksIGMpIHsKICBuID0gMAog IGxlbiA9IGxlbmd0aChzdHIpCiAgZm9yIChpID0gMTsgaSA8PSBsZW47IGkrKykKICAgIHsKICAg ICAgYyA9IHN1YnN0ciAoc3RyLCBpLCAxKQogICAgICBpZiAoYyA+PSAiMCIgJiYgYyA8PSAiOSIp CgluID0gbiAqIDE2ICsgKGMgLSAiMCIpCiAgICAgIGVsc2UKCW4gPSBuICogMTYgKyB0b2hleFt0 b2xvd2VyKGMpXQogICAgfQogIHJldHVybiBuCn0KCmZ1bmN0aW9uIG5hbWUyYWxpYXMobmFtZSAg ICwgdywgdzIpIHsKICAgIG5hbWUgPSB0b2xvd2VyKG5hbWUpCiAgICBpZiAoYWxpYXNbbmFtZV0p IHJldHVybiBhbGlhc1tuYW1lXQogICAgZWxzZSBpZiAobmFtZSB+IC9mb3Igc3ltYm9scy8pIHJl dHVybiAic3ltYm9sIgogICAgZWxzZSBpZiAobmFtZSB+IC9sYXRpbnxjb21iaW5pbmcgLiogbWFy a3N8c3BhY2luZyBtb2RpZmllcnx0b25lIGxldHRlcnN8YWxwaGFiZXRpYyBwcmVzZW50YXRpb24v KSByZXR1cm4gImxhdGluIgogICAgZWxzZSBpZiAobmFtZSB+IC9jamt8eWlqaW5nfGVuY2xvc2Vk IGlkZW9ncmFwaHxrYW5neGkvKSByZXR1cm4gImhhbiIKICAgIGVsc2UgaWYgKG5hbWUgfiAvYXJh YmljLykgcmV0dXJuICJhcmFiaWMiCiAgICBlbHNlIGlmIChuYW1lIH4gL15ncmVlay8pIHJldHVy biAiZ3JlZWsiCiAgICBlbHNlIGlmIChuYW1lIH4gL15jb3B0aWMvKSByZXR1cm4gImNvcHRpYyIK ICAgIGVsc2UgaWYgKG5hbWUgfiAvY3VuZWlmb3JtIG51bWJlci8pIHJldHVybiAiY3VuZWlmb3Jt LW51bWJlcnMtYW5kLXB1bmN0dWF0aW9uIgogICAgZWxzZSBpZiAobmFtZSB+IC9jdW5laWZvcm0v KSByZXR1cm4gImN1bmVpZm9ybSIKICAgIGVsc2UgaWYgKG5hbWUgfiAvbWF0aGVtYXRpY2FsIGFs cGhhbnVtZXJpYyBzeW1ib2wvKSByZXR1cm4gIm1hdGhlbWF0aWNhbCIKICAgIGVsc2UgaWYgKG5h bWUgfiAvcHVuY3R1YXRpb258bWF0aGVtYXRpY2FsfGFycm93c3xjdXJyZW5jeXxzdXBlcnNjcmlw dHxzbWFsbCBmb3JtIHZhcmlhbnRzfGdlb21ldHJpY3xkaW5nYmF0c3xlbmNsb3NlZHxhbGNoZW1p Y2FsfHBpY3RvZ3JhcGh8ZW1vdGljb258dHJhbnNwb3J0LykgcmV0dXJuICJzeW1ib2wiCiAgICBl bHNlIGlmIChuYW1lIH4gL2NhbmFkaWFuIGFib3JpZ2luYWwvKSByZXR1cm4gImNhbmFkaWFuLWFi b3JpZ2luYWwiCiAgICBlbHNlIGlmIChuYW1lIH4gL2thdGFrYW5hfGhpcmFnYW5hLykgcmV0dXJu ICJrYW5hIgogICAgZWxzZSBpZiAobmFtZSB+IC9teWFubWFyLykgcmV0dXJuICJidXJtZXNlIgog ICAgZWxzZSBpZiAobmFtZSB+IC9oYW5ndWwvKSByZXR1cm4gImhhbmd1bCIKICAgIGVsc2UgaWYg KG5hbWUgfiAva2htZXIvKSByZXR1cm4gImtobWVyIgogICAgZWxzZSBpZiAobmFtZSB+IC9icmFp bGxlLykgcmV0dXJuICJicmFpbGxlIgogICAgZWxzZSBpZiAobmFtZSB+IC9eeWkgLykgcmV0dXJu ICJ5aSIKICAgIGVsc2UgaWYgKG5hbWUgfiAvc3Vycm9nYXRlc3xwcml2YXRlIHVzZXx2YXJpYXRp b24gc2VsZWN0b3JzLykgcmV0dXJuIDAKICAgIGVsc2UgaWYgKG5hbWUgfi9eKHNwZWNpYWxzfHRh Z3MpJC8pIHJldHVybiAwCiAgICBlbHNlIGlmIChuYW1lIH4gL2xpbmVhciBiLykgcmV0dXJuICJs aW5lYXItYiIKICAgIGVsc2UgaWYgKG5hbWUgfiAvYXJhbWFpYy8pIHJldHVybiAiYXJhbWFpYyIK ICAgIGVsc2UgaWYgKG5hbWUgfiAvcnVtaSBudW0vKSByZXR1cm4gInJ1bWktbnVtYmVyIgogICAg ZWxzZSBpZiAobmFtZSB+IC9kdXBsb3lhbnxzaG9ydGhhbmQvKSByZXR1cm4gImR1cGxveWFuLXNo b3J0aGFuZCIKICAgIGVsc2UgaWYgKG5hbWUgfiAvc3V0dG9uIHNpZ253cml0aW5nLykgcmV0dXJu ICJzdXR0b24tc2lnbi13cml0aW5nIgoKICAgIHN1YigvIChleHRlbmRlZHxleHRlbnNpb25zfHN1 cHBsZW1lbnQpLiovLCAiIiwgbmFtZSkKICAgIHN1YigvbnVtYmVycy8sICJudW1iZXIiLCBuYW1l KQogICAgc3ViKC9udW1lcmFscy8sICJudW1lcmFsIiwgbmFtZSkKICAgIHN1Yigvc3ltYm9scy8s ICJzeW1ib2wiLCBuYW1lKQogICAgc3ViKC9mb3JtcyQvLCAiZm9ybSIsIG5hbWUpCiAgICBzdWIo L3RpbGVzJC8sICJ0aWxlIiwgbmFtZSkKICAgIHN1YigvXm5ldyAvLCAiIiwgbmFtZSkKICAgIHN1 YigvIChjaGFyYWN0ZXJzfGhpZXJvZ2x5cGhzfGN1cnNpdmUpJC8sICIiLCBuYW1lKQogICAgZ3N1 YigvIC8sICItIiwgbmFtZSkKCiAgICByZXR1cm4gbmFtZQp9CgovXlswLTlBLUZdLyB7CiAgICBz ZXAgPSBpbmRleCgkMSwgIi4uIikKICAgIGxlbiA9IGxlbmd0aCgkMSkKICAgIHMgPSBzdWJzdHIo JDEsMSxzZXAtMSkKICAgIGUgPSBzdWJzdHIoJDEsc2VwKzIsbGVuLXNlcC0yKQogICAgJDEgPSAi IgogICAgc3ViKC9eICovLCAiIiwgJDApCiAgICBpKysKICAgIHN0YXJ0W2ldID0gZml4X3N0YXJ0 W3NdID8gZml4X3N0YXJ0W3NdIDogcwogICAgZW5kW2ldID0gZml4X2VuZFtlXSA/IGZpeF9lbmRb ZV06IGUKICAgIG5hbWVbaV0gPSAkMAoKICAgIGFsdFtpXSA9IG5hbWUyYWxpYXMobmFtZVtpXSkK CiAgICBpZiAoIWFsdFtpXSkKICAgIHsKICAgICAgICBpLS0KICAgICAgICBuZXh0CiAgICB9Cgog ICAgIyMgQ29tYmluZSBhZGphY2VudCByYW5nZXMgd2l0aCB0aGUgc2FtZSBuYW1lLgogICAgaWYg KGFsdFtpXSA9PSBhbHRbaS0xXSAmJiBkZWNvZGVfaGV4KHN0YXJ0W2ldKSA9PSAxICsgZGVjb2Rl X2hleChlbmRbaS0xXSkpCiAgICB7CiAgICAgICAgZW5kW2ktMV0gPSBlbmRbaV0KICAgICAgICBu YW1lW2ktMV0gPSAobmFtZVtpLTFdICIsICIgbmFtZVtpXSkKICAgICAgICBpLS0KICAgIH0KCiAg ICAjIyBTb21lIGhhcmQtY29kZWQgc3BsaXRzLgogICAgaWYgKHN0YXJ0W2ldID09ICIwMzcwIikK ICAgIHsKICAgICAgICBlbmRbaV0gPSAiMDNFMSIKICAgICAgICBpKysKICAgICAgICBzdGFydFtp XSA9ICIwM0UyIgogICAgICAgIGVuZFtpXSA9ICIwM0VGIgogICAgICAgIGFsdFtpXSA9ICJjb3B0 aWMiCiAgICAgICAgaSsrCiAgICAgICAgc3RhcnRbaV0gPSAiMDNGMCIKICAgICAgICBlbmRbaV0g PSAiMDNGRiIKICAgICAgICBhbHRbaV0gPSAiZ3JlZWsiCiAgICB9CiAgICBlbHNlIGlmIChzdGFy dFtpXSA9PSAiRkIwMCIpCiAgICB7CiAgICAgICAgZW5kW2ldID0gIkZCMDYiCiAgICAgICAgaSsr CiAgICAgICAgc3RhcnRbaV0gPSAiRkIxMyIKICAgICAgICBlbmRbaV0gPSAiRkIxNyIKICAgICAg ICBhbHRbaV0gPSAiYXJtZW5pYW4iCiAgICAgICAgaSsrCiAgICAgICAgc3RhcnRbaV0gPSAiRkIx RCIKICAgICAgICBlbmRbaV0gPSAiRkI0RiIKICAgICAgICBhbHRbaV0gPSAiaGVicmV3IgogICAg fQogICAgZWxzZSBpZiAoc3RhcnRbaV0gPT0gIkZGMDAiKQogICAgewogICAgICAgIGVuZFtpXSA9 ICJGRjVGIgogICAgICAgIGkrKwogICAgICAgIHN0YXJ0W2ldID0gIkZGNjEiCiAgICAgICAgZW5k W2ldID0gIkZGOUYiCiAgICAgICAgYWx0W2ldID0gImthbmEiCiAgICAgICAgaSsrCiAgICAgICAg c3RhcnRbaV0gPSAiRkZFMCIKICAgICAgICBlbmRbaV0gPSAiRkZFRiIKICAgICAgICBhbHRbaV0g PSAiY2prLW1pc2MiCiAgICB9Cn0KCkVORCB7CiAgICBwcmludCAiOzs7IGNoYXJzY3JpcHQuZWwg LS0tIGNoYXJhY3RlciBzY3JpcHQgdGFibGUgLSotIG5vLWJ5dGUtY29tcGlsZTogdCAtKi0iCiAg ICBwcmludCAiOzs7IEF1dG9tYXRpY2FsbHkgZ2VuZXJhdGVkIGZyb20gYWRtaW4vdW5pZGF0YS9C bG9ja3MudHh0IgogICAgcHJpbnQgIihsZXQgKHNjcmlwdC1saXN0KSIKICAgIHByaW50ICIgIChk b2xpc3QgKGVsdCAnKCIKCiAgICBmb3IgKGo9MTtqPD1pO2orKykKICAgIHsKICAgICAgICBwcmlu dGYoIiAgICAoI3glcyAjeCVzICVzKSIsIHN0YXJ0W2pdLCBlbmRbal0sIGFsdFtqXSkKICAgICAg ICAjIyBGdXp6IHRvIGRlY2lkZSB3aGV0aGVyIHdvcnRoIHByaW50aW5nIG9yaWdpbmFsIG5hbWUg YXMgYSBjb21tZW50LgogICAgICAgIGlmIChuYW1lW2pdICYmIGFsdFtqXSAhPSB0b2xvd2VyKG5h bWVbal0pICYmIGFsdFtqXSAhfiAvLS8pCiAgICAgICAgICAgIHByaW50ZigiIDsgJXMiLCBuYW1l W2pdKQogICAgICAgIHByaW50ZigiXG4iKQogICAgfQoKICAgIHByaW50ICIgICAgKSkiCiAgICBw cmludCAiICAgIChzZXQtY2hhci10YWJsZS1yYW5nZSBjaGFyLXNjcmlwdC10YWJsZSIKICAgIHBy aW50ICIJCQkgIChjb25zIChjYXIgZWx0KSAobnRoIDEgZWx0KSkgKG50aCAyIGVsdCkpIgogICAg cHJpbnQgIiAgICAob3IgKG1lbXEgKG50aCAyIGVsdCkgc2NyaXB0LWxpc3QpIgogICAgcHJpbnQg Igkoc2V0cSBzY3JpcHQtbGlzdCAoY29ucyAobnRoIDIgZWx0KSBzY3JpcHQtbGlzdCkpKSkiCiAg ICBwcmludCAiICAoc2V0LWNoYXItdGFibGUtZXh0cmEtc2xvdCBjaGFyLXNjcmlwdC10YWJsZSAw IChucmV2ZXJzZSBzY3JpcHQtbGlzdCkpKSIKICAgIHByaW50ICIiCiAgICBwcmludCAiKHByb3Zp ZGUgJ2NoYXJzY3JpcHQpIgp9Cg== --=-=-=--
bug-gnu-emacs@HIDDEN
:bug#20789
; Package emacs
.
Full text available.Received: (at 20789) by debbugs.gnu.org; 12 Jun 2015 08:28:24 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jun 12 04:28:24 2015 Received: from localhost ([127.0.0.1]:51250 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z3KK7-0000Mn-JQ for submit <at> debbugs.gnu.org; Fri, 12 Jun 2015 04:28:24 -0400 Received: from mtaout22.012.net.il ([80.179.55.172]:42347) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eliz@HIDDEN>) id 1Z3KK5-0000MX-0t for 20789 <at> debbugs.gnu.org; Fri, 12 Jun 2015 04:28:22 -0400 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NPT00500OTNVV00@HIDDEN> for 20789 <at> debbugs.gnu.org; Fri, 12 Jun 2015 11:28:14 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NPT005OAOV1SU30@HIDDEN>; Fri, 12 Jun 2015 11:28:14 +0300 (IDT) Date: Fri, 12 Jun 2015 11:28:09 +0300 From: Eli Zaretskii <eliz@HIDDEN> Subject: Re: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation In-reply-to: <rek2v93mux.fsf@HIDDEN> X-012-Sender: halo1@HIDDEN To: Glenn Morris <rgm@HIDDEN> Message-id: <83y4jpqqjq.fsf@HIDDEN> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <21zj45kiix.fsf@HIDDEN> <rek2v93mux.fsf@HIDDEN> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20789 Cc: 20789 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii <eliz@HIDDEN> List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 1.0 (+) > From: Glenn Morris <rgm@HIDDEN> > Date: Thu, 11 Jun 2015 18:24:06 -0400 > > Glenn Morris wrote: > > > Error (initialization): Creation of the default fontsets failed: (error > > Invalid script or charset name: cuneiform-numbers-and-punctuation) > > I fixed a typo that seems to have caused that. Sorry about that. > I don't suppose that big list can be auto-generated from the inputs? It's not trivial. I describe below some of the issues, in the hope that Someoneā¢ will volunteer: . Most of the script names come from the corresponding Unicode blocks, with trivial transformations (downcase words and replace blanks with a hyphen). So basically, we will need to use the information in Blocks.txt, a file that is part of the Unicode Character Database (UCD), but with quirks described below. . The first quirk is that we lump together all the blocks that belong to the same script, like "Basic Latin", "Latin Extended-A", "Latin-1 Supplement", etc. -- these all go to the single script called 'latin'. Likewise with other similar blocks that are either "SOMETHING Extended" or "Supplement" or whatever. . The second quirk is with the CJK characters: those are divided into several broad scripts like 'han', 'kana', and 'cjk-misc' whose exact rules I don't know. . The third quirk is with the 'symbol' pseudo-script: we lump there all punctuation characters and all symbol characters (those for which the General Category is one of Pc, Pd, Ps, Pe, Pi, Pf, Po, Sm, Sc, Sk, So), but with the following notable exception: punctuation characters that belong to blocks that include non-punctuation characters are left in those blocks -- those are punctuation characters used only with the scripts named by those blocks, like U+05BE HEBREW PUNCTUATION MAQAF, which is only used by the Hebrew script. . Another quirk is that mathematical alphanumerics (which are just letters from the Unicode POV) are lumped into a separate script 'mathematical'. Alternatively, one could use Scripts.txt from the UCD, and then the only problem is to subdivide what they call "Common" into the scripts we use. For the general category of a character, one can do in Emacs: (get-char-code-property CHAR 'general-category) Alternatively, one can search UnicodeData.txt directly: the General Category is the 3rd field there. Patches are welcome to do all of the above automatically, perhaps with some small database that expresses the more tricky of the above rules.
bug-gnu-emacs@HIDDEN
:bug#20789
; Package emacs
.
Full text available.Glenn Morris <rgm@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at 20789) by debbugs.gnu.org; 11 Jun 2015 22:24:15 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jun 11 18:24:15 2015 Received: from localhost ([127.0.0.1]:51087 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z3AtT-00011Y-H6 for submit <at> debbugs.gnu.org; Thu, 11 Jun 2015 18:24:15 -0400 Received: from eggs.gnu.org ([208.118.235.92]:40166) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <rgm@HIDDEN>) id 1Z3AtR-00011K-AR for 20789 <at> debbugs.gnu.org; Thu, 11 Jun 2015 18:24:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z3AtL-0004SI-Am for 20789 <at> debbugs.gnu.org; Thu, 11 Jun 2015 18:24:08 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:50530) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z3AtL-0004SE-7R for 20789 <at> debbugs.gnu.org; Thu, 11 Jun 2015 18:24:07 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@HIDDEN>) id 1Z3AtK-0002SE-LU; Thu, 11 Jun 2015 18:24:06 -0400 From: Glenn Morris <rgm@HIDDEN> To: 20789 <at> debbugs.gnu.org Subject: Re: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation References: <21zj45kiix.fsf@HIDDEN> X-Spook: Suicide bomber Trafficking CIDA UOP digicash Temblor X-Ran: QWzIxK})m-=&aolblW9bx[=\"&r"e:MJ];!%<c:a0?^|bzmn,/Qf!@MXC;9=8"w`?X{Uq_ X-Hue: red X-Debbugs-No-Ack: yes X-Attribution: GM Date: Thu, 11 Jun 2015 18:24:06 -0400 In-Reply-To: <21zj45kiix.fsf@HIDDEN> (Glenn Morris's message of "Thu, 11 Jun 2015 18:05:42 -0400") Message-ID: <rek2v93mux.fsf@HIDDEN> User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20789 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.0 (-----) Glenn Morris wrote: > Error (initialization): Creation of the default fontsets failed: (error > Invalid script or charset name: cuneiform-numbers-and-punctuation) I fixed a typo that seems to have caused that. I don't suppose that big list can be auto-generated from the inputs? > A second bug: the *Warnings* buffer is not shown at startup, *scratch* is.
bug-gnu-emacs@HIDDEN
:bug#20789
; Package emacs
.
Full text available.Received: (at submit) by debbugs.gnu.org; 11 Jun 2015 22:05:54 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jun 11 18:05:53 2015 Received: from localhost ([127.0.0.1]:51074 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Z3Abg-0000ag-5B for submit <at> debbugs.gnu.org; Thu, 11 Jun 2015 18:05:52 -0400 Received: from eggs.gnu.org ([208.118.235.92]:36046) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <rgm@HIDDEN>) id 1Z3Abc-0000aP-NH for submit <at> debbugs.gnu.org; Thu, 11 Jun 2015 18:05:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z3AbW-0005Vy-Sl for submit <at> debbugs.gnu.org; Thu, 11 Jun 2015 18:05:43 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:50240) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <rgm@HIDDEN>) id 1Z3AbW-0005Vu-P7 for submit <at> debbugs.gnu.org; Thu, 11 Jun 2015 18:05:42 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from <rgm@HIDDEN>) id 1Z3AbW-0000nx-CG; Thu, 11 Jun 2015 18:05:42 -0400 From: Glenn Morris <rgm@HIDDEN> To: submit <at> debbugs.gnu.org Subject: Invalid script or charset name: cuneiform-numbers-and-punctuation X-Spook: COSCO Mena CID Suspicious device BLU-114/B UN Consul X-Ran: qf|z=uq:*6FdoEp:7oMzbob2XGSNe$[?)lw_vIQntMMZI_[VU'V3{[s=?d[ChKS;q!%j<C X-Hue: magenta X-Debbugs-No-Ack: yes X-Attribution: GM Date: Thu, 11 Jun 2015 18:05:42 -0400 Message-ID: <21zj45kiix.fsf@HIDDEN> User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.0 (-----) Package: emacs Version: 25.0.50 Current master on x86_64 RHEL 7.1. emacs -Q: All looks fine, but there is a *Warnings* buffer with contents: Error (initialization): Creation of the default fontsets failed: (error Invalid script or charset name: cuneiform-numbers-and-punctuation) A second bug: the *Warnings* buffer is not shown at startup, *scratch* is.
bug-gnu-emacs@HIDDEN
:bug#20789
; Package emacs
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.