GNU bug report logs - #58168
string-lessp glitches and inconsistencies

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Reported by: Mattias Engdegård <mattias.engdegard@HIDDEN>; dated Thu, 29 Sep 2022 16:25:01 UTC; Maintainer for emacs is bug-gnu-emacs@HIDDEN.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 6 Oct 2022 12:43:22 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 06 08:43:22 2022
Received: from localhost ([127.0.0.1]:59322 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ogQDi-0001WN-19
	for submit <at> debbugs.gnu.org; Thu, 06 Oct 2022 08:43:22 -0400
Received: from mail-lj1-f173.google.com ([209.85.208.173]:44921)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1ogQDe-0001W7-9U
 for 58168 <at> debbugs.gnu.org; Thu, 06 Oct 2022 08:43:20 -0400
Received: by mail-lj1-f173.google.com with SMTP id q17so2060764lji.11
 for <58168 <at> debbugs.gnu.org>; Thu, 06 Oct 2022 05:43:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject
 :date; bh=Gu4e0OWJsJRq/LM30o8iHx7tXnxrU9jfJ2JGGVfGZAs=;
 b=Xvt5eqNPWmV+dYvlTDAiRkcXE949Vh6dH1wb7l8unL+j3DY/bCBwWcybgWKMtftGMZ
 Es4Q0lsDcr2hWxocYMI4qdWFWJNBbcu3s9DsTa19CQ6gnCdxD92aAihh39CIEVy7OTQC
 358fDQPDBLmufRqJ0BngibkgvpcKFHZklNwhc+kCuHtsfIcFfWcRBt3Cwsnm3Gu+qCVN
 X11c2YFkmHU7DsBa9xsTjMWDvDka8diKnfFCxbGo+HqZUzLI8Xkv2xidf5xaNVsl5CzU
 F9vQtou7Iez4h0S97Rdm9Wzh2nBcQ+O9mLrD11FfFSXArmqc0pAu59vq4ijq7WclZB0S
 Fl+w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:x-gm-message-state
 :from:to:cc:subject:date;
 bh=Gu4e0OWJsJRq/LM30o8iHx7tXnxrU9jfJ2JGGVfGZAs=;
 b=qi7lcYMY/3vV5C2eKhr1Eq0JzBfYX1vA+q603SukGeTYFTWQjLsrfeUmhsx+9govby
 LJp7tT9QnbeHkneIRdpEoxRMmRg4Q6Lxlnw/Ch+rkeHoK4t0oHolxmId99S/AJ7IzOBv
 2rop1Rdei072hm9bQuS/ang1Teil+TOdMMawILrCLBfefXIq0+xQJy98f+Pbted/aUoL
 2lJGm4qRVmd7O4KjlAlLaobObl4Tdy1SVVCrzdujjsnw5ij8XymcogGXNpGzmD4HNZQL
 BbEEqJhY2hT8uproByUTcQfPgrZZTjUx/0dq7nl+6eeJqMwaET6JqsaME5NnM/hNJCAL
 kP8Q==
X-Gm-Message-State: ACrzQf2umumizq7UJp07CyizxC2+/CWemZ58vYKvCT98HJpRWIffyT2W
 2hDIIEq1nb/JDt4nuoygYwA=
X-Google-Smtp-Source: AMsMyM6mjhBY/qG0l8tbsdvlpS1JE960q6WcXpbqKdypnXdv11IfnHlw3+Ego/hp+Ob+1UvBnp6XiQ==
X-Received: by 2002:a05:651c:110c:b0:26d:a099:8715 with SMTP id
 e12-20020a05651c110c00b0026da0998715mr1754100ljo.44.1665060192150; 
 Thu, 06 Oct 2022 05:43:12 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 u19-20020ac25193000000b00499b232875dsm2686275lfi.171.2022.10.06.05.43.10
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 06 Oct 2022 05:43:10 -0700 (PDT)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <83v8oxp5lk.fsf@HIDDEN>
Date: Thu, 6 Oct 2022 14:43:09 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <4CFC3078-64FB-4EAC-A536-F6CBCEE2087D@HIDDEN>
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN> <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <878rlzj1zv.fsf@HIDDEN> <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN>
 <878rlzfylg.fsf@HIDDEN> <017DAAA2-0383-4B47-855E-28348B2E9F06@HIDDEN>
 <831qrnx1jc.fsf@HIDDEN> <B3639DF2-95E0-459B-B718-A2779EA53B95@HIDDEN>
 <83k05fv9nv.fsf@HIDDEN> <52286A5C-D947-4279-812E-173BB44046E1@HIDDEN>
 <83v8oxp5lk.fsf@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, larsi@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

6 okt. 2022 kl. 13.13 skrev Eli Zaretskii <eliz@HIDDEN>:

>>   (format-message "%\345" 0)
>> =3D> (error "Invalid format operation %=C3=A5")
>=20
> And you want to show %\345 instead?

Maybe, or (as the patch suggested) using a different wording for raw =
bytes. In any case %=C3=A5 is clearly a lie since that character wasn't =
in the format string. What would you rather see in such a case?

>  Are you sure this is not the
> consequence of inserting the error message into a multibyte buffer?

Quite sure. The error message is always produced as multibyte and the %c =
processing done at doprnt.c:471:

	    case 'c':
	      {
		int chr =3D va_arg (ap, int);
		tem =3D CHAR_STRING (chr, (unsigned char *) charbuf);

where CHAR_STRING renders chr (the %c argument passed to `error`) as a =
multibyte char to charbuf here.

>>> Who said anything about #x3fffc?  The original code had #xfc, the
>>> unibyte code for #x3ffffc.
>>=20
>> There seems to be a misunderstanding. The original (and current) code =
attempts to display char #x3fffc, which is not a raw byte. It's just a =
typo for #x3ffffc -- not a big deal.
>=20
> But your change replaced it with \xfc, which is what I questioned.

Oh, I see -- you are looking at the hunk that changed the labels, not =
the character tested. When 3fffc was changed into 3ffffc, the "expected" =
string needed to change accordingly; for the latter, it's \xfc or \374 =
depending on mode.

> Why not test both #x3ffffc and #xfc?  And the same question about
> \777777 vs \374.

Testing #x3ffffc inserts the raw byte #xfc so that takes care of that -- =
the test already exercised inserting the unibyte raw byte #x80 and the =
patch didn't change that.

I don't think these two cases actually exercise different paths in =
redisplay since the buffer is multibyte:

  (insert "\xfc")

and

  (insert (char-to-string #x3ffffc))

should have identical effects on the buffer and hence the display, but =
it doesn't hurt to have one of each.

\777774 is just octal for #x3fffc which was changed into the (intended) =
#x3ffffc, and \374 is octal for #xfc which is covered as above.
Thus, the only case actually removed was #x3fffc since it was a typo, =
and #x10abcd was put in its place.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 6 Oct 2022 11:13:58 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 06 07:13:58 2022
Received: from localhost ([127.0.0.1]:59102 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ogOpC-0000qF-BV
	for submit <at> debbugs.gnu.org; Thu, 06 Oct 2022 07:13:58 -0400
Received: from eggs.gnu.org ([209.51.188.92]:39360)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1ogOp9-0000q2-Gx
 for 58168 <at> debbugs.gnu.org; Thu, 06 Oct 2022 07:13:57 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:33062)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ogOp2-0005ZA-CG; Thu, 06 Oct 2022 07:13:50 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=YHSNjDypIQzIyN3xFP0Z4vUzA/rVT+eBD3VcOOvlo8s=; b=lsJpSIHOxjfJfXBGXSqj
 QCZRgin561UXQbx3m2HLLuLt6RR1aNXG7SzVciKUI8+2ZFToCnuvVuP807eNYQMRNbSAAwgDIS1hV
 fw4JwRqIPKLF2z2Hkp2DGZt0uQBcd0e/l0nbAvxJfibaMkHqnAe9M0aymteKHogRyTO3A42MZoaZR
 m9DAyP1CZeApRGYF5eXpBkVjVwX0HqYxqFi2CYvN3PEAA+3U8HdVJ76z8rLVeb9ziEPJ+ftMCf3qB
 9Cv1tm4kq2AoUomjw7DDH17N0k5QTLst6yWPMRLLff+kmpnmxT9h2pj+Y0FvyNwMz/kMAyG6ewhhU
 46COrJIFgKhp6Q==;
Received: from [87.69.77.57] (port=3936 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ogOp1-0004O2-BB; Thu, 06 Oct 2022 07:13:47 -0400
Date: Thu, 06 Oct 2022 14:13:43 +0300
Message-Id: <83v8oxp5lk.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <52286A5C-D947-4279-812E-173BB44046E1@HIDDEN> (message from
 Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Thu, 6 Oct 2022 11:05:51 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN> <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <878rlzj1zv.fsf@HIDDEN> <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN>
 <878rlzfylg.fsf@HIDDEN> <017DAAA2-0383-4B47-855E-28348B2E9F06@HIDDEN>
 <831qrnx1jc.fsf@HIDDEN> <B3639DF2-95E0-459B-B718-A2779EA53B95@HIDDEN>
 <83k05fv9nv.fsf@HIDDEN> <52286A5C-D947-4279-812E-173BB44046E1@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, larsi@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Mattias Engdegård <mattias.engdegard@HIDDEN>
> Date: Thu, 6 Oct 2022 11:05:51 +0200
> Cc: larsi@HIDDEN,
>  58168 <at> debbugs.gnu.org
> 
> 4 okt. 2022 kl. 18.24 skrev Eli Zaretskii <eliz@HIDDEN>:
> 
> >> This treats unibyte format strings as if they were Latin-1 for the purpose of the error message.
> > 
> > No, it doesn't.  It shows the problematic characters as raw bytes, as
> > in "%\200" (where \200 is a single character).  If you see something
> > different, please show the recipe.
> 
>    (format-message "%\345" 0)
> => (error "Invalid format operation %å")

And you want to show %\345 instead?  Are you sure this is not the
consequence of inserting the error message into a multibyte buffer?

> > Who said anything about #x3fffc?  The original code had #xfc, the
> > unibyte code for #x3ffffc.
> 
> There seems to be a misunderstanding. The original (and current) code attempts to display char #x3fffc, which is not a raw byte. It's just a typo for #x3ffffc -- not a big deal.

But your change replaced it with \xfc, which is what I questioned.
Why not test both #x3ffffc and #xfc?  And the same question about
\777777 vs \374.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 6 Oct 2022 11:06:46 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 06 07:06:46 2022
Received: from localhost ([127.0.0.1]:59096 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ogOiE-0000g8-A4
	for submit <at> debbugs.gnu.org; Thu, 06 Oct 2022 07:06:46 -0400
Received: from eggs.gnu.org ([209.51.188.92]:48068)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1ogOiB-0000fs-0p
 for 58168 <at> debbugs.gnu.org; Thu, 06 Oct 2022 07:06:44 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:47040)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ogOi5-0003rA-MP; Thu, 06 Oct 2022 07:06:37 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=TPzCz7SvVhKSXvzjpf891slfHR9VtQn15G0weTyMTZ8=; b=W4v2PfZZaBYhjdFcXu/J
 IkumbdGJA9LjxLbHB0CDXtEAURw1+HKJJBA48oD2sqZS7DUXdXGLUhPwF0XiGu7cFep+ha6Z3HGSQ
 /JmnpibRtnIWPMA7wpF8IdlATrUpdrS+pcYAFceOEZvUBjnmY5pkILSV+5xyVEnpRITM19XTw5ZmV
 Ow/YHbpgozz6D6YVVD3yl535SK9zIjyKPz+mz7z8YcJLwI9ONfDhoD6FQxKGKJPmBR9wCrZvPVKrD
 CTSw58QhVzwpwUOgHRlt6uqyoavUHRTcdBTg1pdBskRUPtfaZPgGV0+KR0/TPjq+NRBRjj2ZJ2uBN
 AOcxQog+3PQJ1g==;
Received: from [87.69.77.57] (port=3495 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ogOhx-0003TY-11; Thu, 06 Oct 2022 07:06:37 -0400
Date: Thu, 06 Oct 2022 14:06:26 +0300
Message-Id: <83wn9dp5xp.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <E10B6A7A-5517-46AD-B6D5-88A8845736BA@HIDDEN> (message from
 Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Thu, 6 Oct 2022 11:05:04 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <83czbef6le.fsf@HIDDEN> <6CB805F6-89EE-4D7C-A398-F29698733A42@HIDDEN>
 <83h70oce4k.fsf@HIDDEN> <B56DE6FE-732D-432D-B2C2-1B54FC8472B1@HIDDEN>
 <83tu4mais1.fsf@HIDDEN> <BC625893-642A-4B8B-9309-1DCC5E4594B3@HIDDEN>
 <83wn9gw2sp.fsf@HIDDEN> <E10B6A7A-5517-46AD-B6D5-88A8845736BA@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Mattias Engdegård <mattias.engdegard@HIDDEN>
> Date: Thu, 6 Oct 2022 11:05:04 +0200
> Cc: 58168 <at> debbugs.gnu.org
> 
> 4 okt. 2022 kl. 07.55 skrev Eli Zaretskii <eliz@HIDDEN>:
> 
> > If the fact that string= says strings are not equal, but string-lessp
> > says they are equal, is what bothers you, we could document that
> > results of comparing unibyte and multibyte strings are unspecified, or
> > document explicitly that string= and string-lessp behave differently
> > in this case.
> 
> (It's not just string= but `equal` since they use the same comparison.)
> But it's just a part of a set of related problems:
> 
> * string< / string= inconsistency
> * undesirable string< ordering (unibyte strings are treated as Latin-1)
> * bad string< performance 

That doesn't seem different (and the ordering part is not necessary,
IMO).

> Ideally we should be able to do something about all three at the same time since they are interrelated. At the very least it's worth a try.

It depends on the costs and the risks.  All the rest being equal, yes,
solving those would be desirable.  But it isn't equal, and the costs
and the risks of your proposals outweigh the advantages in my book,
sorry.

> Just documenting the annoying parts won't make them go away -- they still have to be coded around by the user, and it doesn't solve any performance problems either.

That's not a catastrophe, because we are already there (sans the
documentation), and because these cases are rare in real life.

> > I see no reason to worry about 100% consistency here: the order
> > is _really_ undefined in these cases, and trying to make it defined
> > will not produce any tangible gains,
> 
> Yes it would: better performance and wider applicability.

These are not tangible enough IMO.

> Even when the order isn't defined the user expects there to be some order between distinct strings.

No, if the order is undefined, the caller cannot expect any order.
Cf. NaN comparisons with numerical values.

> > Once again, slowing down string-lessp when raw-bytes are involved
> > shouldn't be a problem.  So, if memchr finds a C0 or C1 in a string,
> > fall back to a slower comparison.  memchr is fast enough to not slow
> > down the "usual" case.  Would that be a good solution?
> 
> There is no reason a comparison should need to look beyond the first mismatch; anything else is just algorithmically slow. Long strings are likely to differ early on. Any hack that has to special-case raw bytes will add costs.

You missed me here.  Why are you suddenly talking about mismatches?
And if only mismatches matter here, why is it a problem to use memchr
in the first place?

> > Alternatively, we could introduce a new primitive which could assume
> > multibyte or plain-ASCII unibyte strings without checking, and then
> > code which is sure raw-bytes cannot happen, and needs to compare long
> > strings, could use that for speed.
> 
> That or variants thereof are indeed alternatives but users would be forgiven to wonder why we don't make what we have fast instead?

Because the fast versions can break when the assumptions are false.
We already have similar stuff in encoding/decoding area: there are
fast optimized functions that require the caller to make sure some
assumptions hold.

> > E.g., are you saying that unibyte strings that are
> > pure-ASCII also cause performance problems?
> 
> They do because we have no efficient way of ascertaining that they are pure-ASCII.

If we declare that comparing with unibyte non-ASCII produces
unspecified results, we don't have to worry about that: it becomes the
worry of the caller.

> The long-term solution is to make multibyte strings the default in more cases but I'm not proposing such a change right now.

I don't think we will ever get there, FWIW.  Raw bytes in strings are
a fact of life, whether we like it or not.

> I'll see to where further performance tweaking of the existing code can take us with a reasonable efforts, but there are hard limits to what can be done.
> And thank you for your comments!

Thanks.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 6 Oct 2022 09:35:21 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 06 05:35:21 2022
Received: from localhost ([127.0.0.1]:59005 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ogNHl-0004Mk-0G
	for submit <at> debbugs.gnu.org; Thu, 06 Oct 2022 05:35:21 -0400
Received: from mail-lj1-f169.google.com ([209.85.208.169]:35567)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1ogNHe-0004MB-UM
 for 58168 <at> debbugs.gnu.org; Thu, 06 Oct 2022 05:35:18 -0400
Received: by mail-lj1-f169.google.com with SMTP id m14so1572715ljg.2
 for <58168 <at> debbugs.gnu.org>; Thu, 06 Oct 2022 02:35:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject
 :date; bh=aP7R+KsjxngIlXx0VZV8oIyL6IW8I0ZvrJ5jSn/8370=;
 b=AVQtrZ8HCts+8t9nFRID/hoVVw2dBZUHz6pjxfTBPanK43sphXn/x3ReVL67oXbKFF
 OYc5Qcqiq2mK8ytrIcu4ZW7WYbrPgtce/tZtw7OQeDNnq7O2eZOcj6dUcBwD+4eHuowO
 Y7eYyl3REvcF0mpbhqbK8WWsJoDx6LRew0AbIalxaW208iAO9XaCAypg5/AMZbdbQZCx
 ULOWN5dskD+IBXO4XRy8iLiMy75BVyMcxtnvPB0xE5FAP4NE1qGUqMrhqEy15fwDbZcp
 NZ4JikSfpp8mYCLHN4NQToyYnXIjFEfAUd8uyT9TfYS3+f6bhT5bhjc+MbOBbz4Me+0w
 MnhQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:x-gm-message-state
 :from:to:cc:subject:date;
 bh=aP7R+KsjxngIlXx0VZV8oIyL6IW8I0ZvrJ5jSn/8370=;
 b=niHg+QG0KkEskfJzSawDnGTRAE46L16QPdKQOn7bqDT02xbmhc5CpnLD0skjDC6NB0
 F/Y2pU0kmP31GuDdk86zg8C7VSvWXbNG8euRAYYvA9edeSMbtaedp6Qsl+UztMAVypqx
 DPmwmRPPPQDnIo9shHya11E5nSOCCQEQbcIQi61VySLGs7dnIuqQB49xb+a9bZaHh/de
 AhaDYbf/oeUAgEBIKYyT/hAfsZ3447X3aiW5B0oB6FZm2KPebywu9UT+D4rpKMGjX18X
 xFkG/VSBlK9El0eAq+9ZrVefRGuBMyIdPIv1mXPpToYtHlBHUOoFeO0qGkWQeVMkXk79
 VLIA==
X-Gm-Message-State: ACrzQf04oMKI/vklj9awL3NhI2YNojw6yUnAtCpEXtIAq8+0O6gt+XvO
 uNfVh8uQcGE9fpS4qNcFJhqoIy/3+foJdQ==
X-Google-Smtp-Source: AMsMyM7DrXhjsJL2oFxlHpmwmlUIN34xeS4C4kzqAfJIW+OMoSOaxdtrZ0aYDMWE+9aewDp8ugx7yw==
X-Received: by 2002:a05:651c:210e:b0:26c:7db3:17da with SMTP id
 a14-20020a05651c210e00b0026c7db317damr1396437ljq.220.1665048908752; 
 Thu, 06 Oct 2022 02:35:08 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 g28-20020a2e391c000000b0026ddea22596sm1204567lja.37.2022.10.06.02.35.07
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 06 Oct 2022 02:35:08 -0700 (PDT)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <83wn9gw2sp.fsf@HIDDEN>
Date: Thu, 6 Oct 2022 11:05:04 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <E10B6A7A-5517-46AD-B6D5-88A8845736BA@HIDDEN>
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <83czbef6le.fsf@HIDDEN> <6CB805F6-89EE-4D7C-A398-F29698733A42@HIDDEN>
 <83h70oce4k.fsf@HIDDEN> <B56DE6FE-732D-432D-B2C2-1B54FC8472B1@HIDDEN>
 <83tu4mais1.fsf@HIDDEN> <BC625893-642A-4B8B-9309-1DCC5E4594B3@HIDDEN>
 <83wn9gw2sp.fsf@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

4 okt. 2022 kl. 07.55 skrev Eli Zaretskii <eliz@HIDDEN>:

> If the fact that string=3D says strings are not equal, but =
string-lessp
> says they are equal, is what bothers you, we could document that
> results of comparing unibyte and multibyte strings are unspecified, or
> document explicitly that string=3D and string-lessp behave differently
> in this case.

(It's not just string=3D but `equal` since they use the same =
comparison.)
But it's just a part of a set of related problems:

* string< / string=3D inconsistency
* undesirable string< ordering (unibyte strings are treated as Latin-1)
* bad string< performance=20

Ideally we should be able to do something about all three at the same =
time since they are interrelated. At the very least it's worth a try.

Just documenting the annoying parts won't make them go away -- they =
still have to be coded around by the user, and it doesn't solve any =
performance problems either.

> I see no reason to worry about 100% consistency here: the order
> is _really_ undefined in these cases, and trying to make it defined
> will not produce any tangible gains,

Yes it would: better performance and wider applicability. Even when the =
order isn't defined the user expects there to be some order between =
distinct strings.

> Once again, slowing down string-lessp when raw-bytes are involved
> shouldn't be a problem.  So, if memchr finds a C0 or C1 in a string,
> fall back to a slower comparison.  memchr is fast enough to not slow
> down the "usual" case.  Would that be a good solution?

There is no reason a comparison should need to look beyond the first =
mismatch; anything else is just algorithmically slow. Long strings are =
likely to differ early on. Any hack that has to special-case raw bytes =
will add costs.

The best we can hope for is hand-written vectorised code that does =
everything in one pass but it's still slower than just a memcmp.
Even then our chosen semantics make that more difficult (and slower) =
than it needs to be: for example, we cannot assume that any byte with =
the high bit set indicates a mismatch when comparing unibyte strings =
with multibyte, since we equate unibyte chars with Latin-1. It's a =
decision that we will keep paying for.

> Alternatively, we could introduce a new primitive which could assume
> multibyte or plain-ASCII unibyte strings without checking, and then
> code which is sure raw-bytes cannot happen, and needs to compare long
> strings, could use that for speed.

That or variants thereof are indeed alternatives but users would be =
forgiven to wonder why we don't make what we have fast instead?

> E.g., are you saying that unibyte strings that are
> pure-ASCII also cause performance problems?

They do because we have no efficient way of ascertaining that they are =
pure-ASCII. The long-term solution is to make multibyte strings the =
default in more cases but I'm not proposing such a change right now.

I'll see to where further performance tweaking of the existing code can =
take us with a reasonable efforts, but there are hard limits to what can =
be done.
And thank you for your comments!





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 6 Oct 2022 09:35:18 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 06 05:35:17 2022
Received: from localhost ([127.0.0.1]:59003 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ogNHh-0004MV-LG
	for submit <at> debbugs.gnu.org; Thu, 06 Oct 2022 05:35:17 -0400
Received: from mail-lf1-f49.google.com ([209.85.167.49]:39561)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1ogNHg-0004MF-91
 for 58168 <at> debbugs.gnu.org; Thu, 06 Oct 2022 05:35:16 -0400
Received: by mail-lf1-f49.google.com with SMTP id b2so1847660lfp.6
 for <58168 <at> debbugs.gnu.org>; Thu, 06 Oct 2022 02:35:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject
 :date; bh=oormB79vgwWh0EdYHb1PLRlPl9Z8b1+KE4Qu/FUWyoI=;
 b=TttpRzpbNTPKA/jHGJGG8/Gh/3d56prXtKWB/n23G3ab5Dbl5AsKzo16SYiBatTP+X
 TR2EZ9BtJrmzcEyJsVeaYGhlAmLqixlESkAe4AFRmfnMCq0aWIcfVybZlMC8e74MYhyb
 5RwM9PLQyVgHrxAFe0Ri4z+jXI1dAnEFQ+rmYm8m3g4JYmhZLA0mPzWrT6u7vhz5UWqN
 YiZNBldUt9agxnWAzXlg3/JnSaR/Ew+2wLG95jhJSLcBmxA3VcS8BEDFt8h2hPOHT5a/
 +7HOCuaivNg4FTM8lHORgsyK19PtXmbcQ8SXSjNknozzVQf/6COTmzjn3DJTsR3uJyQN
 idNQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:x-gm-message-state
 :from:to:cc:subject:date;
 bh=oormB79vgwWh0EdYHb1PLRlPl9Z8b1+KE4Qu/FUWyoI=;
 b=O/0Z5R8J9rwqOWLhcC9CIpeFWGAOECYX63NRaRIB89ij7eefrUvcWiZQtatLzKgV+3
 m8e2J24dqpH+9PkVLGe+tEPrsPf+Ltof9VEb9bHObeqJPCCfN76wFSux8Hg7H7Y3n9pQ
 3vBympLuWKyROqORXwc7iOiSTUBpdrp9JyDaeyuwMM6G2Z72LxPMhnghLcT45maeCViM
 2sBbbLSqrNyyHB38+N1yle+OsUaC1vwWrTmeru7ZTC9HK0xt73LOk7fZcr2w4J9+ypgz
 NmNMCFilUDMVl4K9EPUExHNzGUqDhBOe2czCzDX6zBeyCB6IUa+Bhlofq+NqVtG+K7/T
 cChg==
X-Gm-Message-State: ACrzQf1Z/PRIUHubzjfsyzGkYOI1JE+o5NURYIKa6cRsfppTc9twjEAw
 ElVfijZv7t5g59rLDdf2dFM=
X-Google-Smtp-Source: AMsMyM6jeEi2+SVO3ws/prmvHGh2mIG1iTmyFlAZ9I5jIUtDwQYe8THpSs2CgJY1hkY0x9wkziCtWQ==
X-Received: by 2002:a05:6512:2086:b0:4a2:3740:762c with SMTP id
 t6-20020a056512208600b004a23740762cmr1392468lfr.401.1665048910388; 
 Thu, 06 Oct 2022 02:35:10 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 g28-20020a2e391c000000b0026ddea22596sm1204567lja.37.2022.10.06.02.35.09
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 06 Oct 2022 02:35:09 -0700 (PDT)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <83k05fv9nv.fsf@HIDDEN>
Date: Thu, 6 Oct 2022 11:05:51 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <52286A5C-D947-4279-812E-173BB44046E1@HIDDEN>
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN> <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <878rlzj1zv.fsf@HIDDEN> <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN>
 <878rlzfylg.fsf@HIDDEN> <017DAAA2-0383-4B47-855E-28348B2E9F06@HIDDEN>
 <831qrnx1jc.fsf@HIDDEN> <B3639DF2-95E0-459B-B718-A2779EA53B95@HIDDEN>
 <83k05fv9nv.fsf@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, larsi@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

4 okt. 2022 kl. 18.24 skrev Eli Zaretskii <eliz@HIDDEN>:

>> This treats unibyte format strings as if they were Latin-1 for the =
purpose of the error message.
>=20
> No, it doesn't.  It shows the problematic characters as raw bytes, as
> in "%\200" (where \200 is a single character).  If you see something
> different, please show the recipe.

   (format-message "%\345" 0)
=3D> (error "Invalid format operation %=C3=A5")

where the format string is a unibyte string of two bytes, % and 0xFC, =
yet the error treats it as the Latin-1 character =C3=A5.

In fact,

   (format-message "%=C3=A5" 0)

yields the same error string.

>> Not very important, of course, but maybe there should be a =
UNIBYTE_TO_CHAR in the alternative branch?
>=20
> No, that would show the multibyte codepoint, and will confuse users,
> because the result would look very different from the problematic
> format spec in this case.

Yes, that's probably right. I suppose the right solution is something =
like:

	      unsigned char *p =3D (unsigned char *) format - 1;
	      if (multibyte_format)
		error ("Invalid format operation %%%c", STRING_CHAR =
(p));
	      else
		error (*p <=3D 127 ? "Invalid format operation %%%c"
			         : "Invalid format operation char =
0x%02x",
		       *p);

but perhaps it's a rare error not worth the trouble. (If we don't bother =
changing it, a little comment saying that we are aware of the glitch may =
be a good idea.)

> Who said anything about #x3fffc?  The original code had #xfc, the
> unibyte code for #x3ffffc.

There seems to be a misunderstanding. The original (and current) code =
attempts to display char #x3fffc, which is not a raw byte. It's just a =
typo for #x3ffffc -- not a big deal.

Of course I could have retained the 3fffc under a different label, but =
everyone else reading the test would just assume it was a typo of 3ffffc =
since 3fffc itself is not very interesting. I replaced it with 10abcd, a =
wide Unicode value deliberately chosen to be arbitrary-looking. We could =
use another value if you prefer.

>  I don't see why we shouldn't test both.
> In the other problematic hunk you replaced \777774 with \374 -- why?

3fffc in octal is 777774; when changed to 3ffffc it becomes a raw byte, =
fc, displayed as \374.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 4 Oct 2022 18:07:33 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Oct 04 14:07:33 2022
Received: from localhost ([127.0.0.1]:55155 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ofmKL-0005fY-7k
	for submit <at> debbugs.gnu.org; Tue, 04 Oct 2022 14:07:33 -0400
Received: from eggs.gnu.org ([209.51.188.92]:49674)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1ofmKG-0005fG-9P
 for 58168 <at> debbugs.gnu.org; Tue, 04 Oct 2022 14:07:32 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:44766)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ofmKB-00038D-47; Tue, 04 Oct 2022 14:07:23 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=YaC8PmmSbyDqK+46oE4BzQsbpzdo8ZWnN2gyff4tiOI=; b=VHwWmOZZZ/Cp
 +077VVfPbE67oB5+1TpM4OKTrtpGKHFdBlhyh+VFd/xwuxibQLwRaEfA14PvReyb7wR8OlOawg4wd
 vxTuQEElJWnrhl/XSYB2TIXPlWHxOPvzXaVO34ozXMAddpVk3yzIKbLAixaWH6J3EJvPtH+diVSO/
 2IOC7V/hCyJxnR7CAFz4Hcqh7psyAJOovfi4laNLRKabIuZt4t/QrS0b6mA9gDwpjkvx8dFNHduOR
 fZqx3tAqf+mTnF4N5oNsKi5VKILY4NBWk8s4Zfk83NcP/mNg/iv4HxetmfZNH7cnmo4k1Op1IzNpY
 WjlKAt3PXpFzqFDp6z+F8w==;
Received: from [87.69.77.57] (port=2612 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ofmKA-0000I7-7s; Tue, 04 Oct 2022 14:07:22 -0400
Date: Tue, 04 Oct 2022 21:07:19 +0300
Message-Id: <837d1fv4x4.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: rms@HIDDEN
In-Reply-To: <E1oflty-00006o-DN@HIDDEN> (message from Richard
 Stallman on Tue, 04 Oct 2022 13:40:18 -0400)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <83czbef6le.fsf@HIDDEN> <6CB805F6-89EE-4D7C-A398-F29698733A42@HIDDEN>
 <83h70oce4k.fsf@HIDDEN> <B56DE6FE-732D-432D-B2C2-1B54FC8472B1@HIDDEN>
 <83tu4mais1.fsf@HIDDEN> <BC625893-642A-4B8B-9309-1DCC5E4594B3@HIDDEN>
 <83wn9gw2sp.fsf@HIDDEN> <E1oflty-00006o-DN@HIDDEN>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, mattias.engdegard@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Richard Stallman <rms@HIDDEN>
> Cc: mattias.engdegard@HIDDEN, 58168 <at> debbugs.gnu.org
> Date: Tue, 04 Oct 2022 13:40:18 -0400
> 
>   > If the fact that string= says strings are not equal, but string-lessp
>   > says they are equal, is what bothers you
> 
> That result seems paradoxical to me.

Not if you accept that comparing unibyte non-ASCII text with multibyte
text yields inherently unspecified results.

> Perhaps documenting the difference between these two relationships
> could make the current behavior comprehensible rather than anomalous.

Yes, that's one of the alternatives that I think should be on the
table.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 4 Oct 2022 17:40:28 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Oct 04 13:40:27 2022
Received: from localhost ([127.0.0.1]:55122 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oflu7-0004y6-Hl
	for submit <at> debbugs.gnu.org; Tue, 04 Oct 2022 13:40:27 -0400
Received: from eggs.gnu.org ([209.51.188.92]:49276)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <rms@HIDDEN>) id 1oflu4-0004xe-4u
 for 58168 <at> debbugs.gnu.org; Tue, 04 Oct 2022 13:40:26 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:43992)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <rms@HIDDEN>)
 id 1oflty-0001sW-Nm; Tue, 04 Oct 2022 13:40:18 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=Date:References:Subject:In-Reply-To:To:From:
 mime-version; bh=quLMvFeOWWZKV4W/Y3r7samN/IYxbhdgKpqYS4Z+oa8=; b=h93+DFSxEiKP
 8vdyoWXlJgPh7keYPp6H2dMSi+xRfat6TtEiaprUi7RCPB5Pr+oaD1613Ol22EwLZ5Nihb473loyz
 ATWpkwbU2y0N13CVwdTGtLEqTK7V+vSF5cVuPWH2enpVL4mcbttBFKEI1klx2dDQsOo3LRMy53gyx
 3ZBSg//fVL17jFDt0OeJAZNpYXgKgYIeO/jsV4O+pl1JJxUwdDfnXeh+M89Y+hSMWjHZqIoXFKk9X
 MrKwO5nBgy5AvbgrsJbJ/0wO22qjo2DCId2iLY49rf8A1LKXWs8nNYDJrwso1wpkKnjtv0ciSI9y2
 Hq9BD0fKNErFrmOtHA562Q==;
Received: from rms by fencepost.gnu.org with local (Exim 4.90_1)
 (envelope-from <rms@HIDDEN>)
 id 1oflty-00006o-DN; Tue, 04 Oct 2022 13:40:18 -0400
Content-Type: text/plain; charset=Utf-8
From: Richard Stallman <rms@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
In-Reply-To: <83wn9gw2sp.fsf@HIDDEN> (message from Eli Zaretskii on Tue, 04
 Oct 2022 08:55:34 +0300)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <83czbef6le.fsf@HIDDEN> <6CB805F6-89EE-4D7C-A398-F29698733A42@HIDDEN>
 <83h70oce4k.fsf@HIDDEN> <B56DE6FE-732D-432D-B2C2-1B54FC8472B1@HIDDEN>
 <83tu4mais1.fsf@HIDDEN> <BC625893-642A-4B8B-9309-1DCC5E4594B3@HIDDEN>
 <83wn9gw2sp.fsf@HIDDEN>
Message-Id: <E1oflty-00006o-DN@HIDDEN>
Date: Tue, 04 Oct 2022 13:40:18 -0400
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, mattias.engdegard@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Reply-To: rms@HIDDEN
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > If the fact that string= says strings are not equal, but string-lessp
  > says they are equal, is what bothers you

That result seems paradoxical to me.

I think the way to make sense of it is this: what string-lessp is
really saying is not that the strings are "equal", but rather that
they are lexicographically equivalent.

Perhaps documenting the difference between these two relationships
could make the current behavior comprehensible rather than anomalous.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)






Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 4 Oct 2022 16:25:06 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Oct 04 12:25:06 2022
Received: from localhost ([127.0.0.1]:54996 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ofkjB-0002wz-Ou
	for submit <at> debbugs.gnu.org; Tue, 04 Oct 2022 12:25:06 -0400
Received: from eggs.gnu.org ([209.51.188.92]:42922)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1ofkj8-0002wM-2f
 for 58168 <at> debbugs.gnu.org; Tue, 04 Oct 2022 12:25:04 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:39992)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ofkj2-0006wM-Oz; Tue, 04 Oct 2022 12:24:56 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=+5tCHjlRgYMn8KWoTdfIK0npJS8xj3RNXMkDVpZwTmc=; b=jUbxsV1JyYc22/6OQt4o
 PLoRXo+rY47H0KLyowGJBLkTI++LxmO8awAV0v4eZ8fTZqRc81OGVkJ9SIhfZ8KI5A5PYOzIW6jc7
 /2Q0oOvsKg0cLSdfpBWEKSceSNOzWUJX01vdtBRa+OT4qeLpObsdOy0QYbHptEGYXgk9ZrToYXCZv
 4qTjdd/GT1mbc68HA4HIpLnW0TVdQIRZbWE7css7XhPt2HyxXQk1R+EfskwrCPEN8pO1AAWlMGI3r
 cuC59KIAs/vxpJ+TrNUemzh2BwhJM1/1xoF/sq9UyKMiGvJUD87mJfSDreFIGShKKJYvWZz51NMfx
 MvPyqxw1ZRb1aw==;
Received: from [87.69.77.57] (port=3602 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ofkj2-0008P9-1I; Tue, 04 Oct 2022 12:24:56 -0400
Date: Tue, 04 Oct 2022 19:24:52 +0300
Message-Id: <83k05fv9nv.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <B3639DF2-95E0-459B-B718-A2779EA53B95@HIDDEN> (message from
 Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Tue, 4 Oct 2022 16:44:17 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN> <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <878rlzj1zv.fsf@HIDDEN> <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN>
 <878rlzfylg.fsf@HIDDEN> <017DAAA2-0383-4B47-855E-28348B2E9F06@HIDDEN>
 <831qrnx1jc.fsf@HIDDEN> <B3639DF2-95E0-459B-B718-A2779EA53B95@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, larsi@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Mattias Engdegård <mattias.engdegard@HIDDEN>
> Date: Tue, 4 Oct 2022 16:44:17 +0200
> Cc: larsi@HIDDEN,
>  58168 <at> debbugs.gnu.org
> 
> 4 okt. 2022 kl. 13.37 skrev Eli Zaretskii <eliz@HIDDEN>:
> 
> > First I needed to fix fallout from making STRING_CHAR intolerant of
> > unibyte text, because redisplay-testsuite caused assertion violations
> > in string_char_and_length.
> 
> Good catch! Just to satisfy my curiosity:
> 
> >             error ("Invalid format operation %%%c",
> > -                  STRING_CHAR ((unsigned char *) format - 1));
> > +                  multibyte_format
> > +                  ? STRING_CHAR ((unsigned char *) format - 1)
> > +                  : *((unsigned char *) format - 1));
> 
> This treats unibyte format strings as if they were Latin-1 for the purpose of the error message.

No, it doesn't.  It shows the problematic characters as raw bytes, as
in "%\200" (where \200 is a single character).  If you see something
different, please show the recipe.

> Not very important, of course, but maybe there should be a UNIBYTE_TO_CHAR in the alternative branch?

No, that would show the multibyte codepoint, and will confuse users,
because the result would look very different from the problematic
format spec in this case.

> >  (Doesn't it abort for you? or do you not
> > build Emacs with --enable-checking?)
> 
> Oh I certainly do that occasionally, but it's mostly when I've changed something at the C level or have reason to believe that something is broken there.

Please _always_ test changes related to encoding/decoding and
character representation conversions in a --enable-checking build.  We
should have discovered these bugs in time for Emacs 28.2 to be devoid
of them.

> > I could understand why you'd want to _add_ the larger values, but why
> > replace?
> 
> Because it seemed pretty clear that the old code intended to use #x3ffffc for testing display of raw bytes but a typo turned it into #x3fffc instead which isn't a raw byte but a multibyte character. That it's an easy mistake to make (done so several times myself).

Who said anything about #x3fffc?  The original code had #xfc, the
unibyte code for #x3ffffc.  I don't see why we shouldn't test both.
In the other problematic hunk you replaced \777774 with \374 -- why?

> I've now pushed the patch; the code can be improved further if necessary.

I've reverted it.  Please stop this madness of rushing into installing
changes that are still under controversy.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 4 Oct 2022 14:44:31 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Oct 04 10:44:31 2022
Received: from localhost ([127.0.0.1]:54917 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ofj9q-0000TX-IQ
	for submit <at> debbugs.gnu.org; Tue, 04 Oct 2022 10:44:30 -0400
Received: from mail-lj1-f177.google.com ([209.85.208.177]:41638)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1ofj9n-0000TH-8X
 for 58168 <at> debbugs.gnu.org; Tue, 04 Oct 2022 10:44:29 -0400
Received: by mail-lj1-f177.google.com with SMTP id y22so1594550ljc.8
 for <58168 <at> debbugs.gnu.org>; Tue, 04 Oct 2022 07:44:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject
 :date; bh=RMGAaeNOcYPoKWFt1CWUk5fOK07H16S6vL9ygLq6tHM=;
 b=ozjCs62SfUchl4a6K+p3aVPbGdigudeII53W+zGUs23lDUVChTM3m5UsyaHIP9Sunc
 nY9BbWWVqdayXZEGRTtgHsRHOI5WrAh26vdugdp1eFrWa27tGpI4Bg+TkqwcoK97a+AU
 fObxrMrsy84wzYXa+8Kw0ZarZhughY/2W5YwiOgBwy4i2RPjUfUp/q4HtFmi6gkiTDHQ
 BjFLMf0QY2t0mTtB4Cl3TuTMz3VF3Md5M4BQkS0NfjIrYhLkV4yVeUIHbuLjSyCwyHL/
 hrRt/EI1XoRBr3ABVqTyH9SSnBzOFSK+r+M+EjyOKywVFEBPAi/tEvArienlSoskJ6qV
 4BWg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:x-gm-message-state
 :from:to:cc:subject:date;
 bh=RMGAaeNOcYPoKWFt1CWUk5fOK07H16S6vL9ygLq6tHM=;
 b=VM1c/EDZhEX8oyf15HSlJMuM1ETz5lVMCwndZ4aSIO/5NSnESyFJljjRpFyJalH6Sq
 awzR5q1pEEa/dqnF8VpGZChEvpHhjPsOlUWoTG5l0CHhms19urMXxI2CRnrRpJaDgfiL
 ZqHCkrACREG/1XUimdo/Jx2nmz2CorZNhFIailTCjrF2XJr7Rx2bDlYZ7o4cuhLf0dfl
 HEFb4ZZtFqvjbQd5NOIxgt/YkXtAkLYrB1a+Nr/knHT/XtKxMRhU+W4JBWstQSsCzSv+
 TR7T1mP6uGKxyg3RobmG2rPfCS6aqx5YLfkCU7v+EWO5cCOQGZOhjZ9srqMK1wpE0srO
 4rIQ==
X-Gm-Message-State: ACrzQf0uk85L0MbqnSwcAfIRC913BQuNAKiIT1vA0Ia/KiljmUwWPjVF
 Zooz7OUZxAbbj9fIF4nXdKw=
X-Google-Smtp-Source: AMsMyM5YdGNj2loTxj02ov2NRwHRK7uoeohaY0Thg18zGZ43HOpdV+XXrovjVRHTpe+x6+C4uDkzNw==
X-Received: by 2002:a2e:7210:0:b0:26d:94f3:23c4 with SMTP id
 n16-20020a2e7210000000b0026d94f323c4mr8369356ljc.192.1664894659525; 
 Tue, 04 Oct 2022 07:44:19 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 p20-20020a2eb7d4000000b0026c5dce1f9dsm1248090ljo.106.2022.10.04.07.44.18
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 04 Oct 2022 07:44:18 -0700 (PDT)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <831qrnx1jc.fsf@HIDDEN>
Date: Tue, 4 Oct 2022 16:44:17 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <B3639DF2-95E0-459B-B718-A2779EA53B95@HIDDEN>
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN> <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <878rlzj1zv.fsf@HIDDEN> <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN>
 <878rlzfylg.fsf@HIDDEN> <017DAAA2-0383-4B47-855E-28348B2E9F06@HIDDEN>
 <831qrnx1jc.fsf@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, larsi@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

4 okt. 2022 kl. 13.37 skrev Eli Zaretskii <eliz@HIDDEN>:

> First I needed to fix fallout from making STRING_CHAR intolerant of
> unibyte text, because redisplay-testsuite caused assertion violations
> in string_char_and_length.

Good catch! Just to satisfy my curiosity:

>             error ("Invalid format operation %%%c",
> -                  STRING_CHAR ((unsigned char *) format - 1));
> +                  multibyte_format
> +                  ? STRING_CHAR ((unsigned char *) format - 1)
> +                  : *((unsigned char *) format - 1));

This treats unibyte format strings as if they were Latin-1 for the =
purpose of the error message. Not very important, of course, but maybe =
there should be a UNIBYTE_TO_CHAR in the alternative branch?

>  (Doesn't it abort for you? or do you not
> build Emacs with --enable-checking?)

Oh I certainly do that occasionally, but it's mostly when I've changed =
something at the C level or have reason to believe that something is =
broken there.

> I could understand why you'd want to _add_ the larger values, but why
> replace?

Because it seemed pretty clear that the old code intended to use =
#x3ffffc for testing display of raw bytes but a typo turned it into =
#x3fffc instead which isn't a raw byte but a multibyte character. That =
it's an easy mistake to make (done so several times myself).

Thus the change fixes that: it now correctly tests #x3ffffc (multibyte =
raw byte FC) as well as a couple of undisplayable multibyte chars (one =
C1 control and one astral plane unicode value made undisplayable). Now =
everything should be described correctly, which wasn't the case before.

> As for the bug report which led to display-raw-bytes-as-hex (if that
> what you meant) and its discussion, it's bug#27122.

Thank you, but I actually meant the one where it was agreed that it was =
a good idea to display raw bytes and Latin-1 U+0080..009F in the same =
way. There isn't much of a code trail because it probably never was a =
conscious decision -- it just ended up being that way -- but apparently =
the status quo was defended/rationalised at some point.

I've now pushed the patch; the code can be improved further if =
necessary.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 4 Oct 2022 11:38:06 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Oct 04 07:38:06 2022
Received: from localhost ([127.0.0.1]:52734 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ofgFO-0001AZ-C3
	for submit <at> debbugs.gnu.org; Tue, 04 Oct 2022 07:38:06 -0400
Received: from eggs.gnu.org ([209.51.188.92]:34510)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1ofgFJ-00019v-AF
 for 58168 <at> debbugs.gnu.org; Tue, 04 Oct 2022 07:38:01 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:57368)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ofgFC-0007qE-Nv; Tue, 04 Oct 2022 07:37:50 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=D5qekY0VFkaMjMG/Hkfd310V/koKWcVIgGhKFOOxPOg=; b=rc600PMLhHbF4eoB1wqJ
 KqpJgxssKXiTEeFMM/Kyz4klp7iakvUUzhHUR+Aq+StMBTvLZYQiLfIziMYJ3urmZn8DcH567PJ6I
 daJBbvAg45/nZdpfk4DqWsPT1xl9P9bPy25YvhpGEkid/HgVlEwKjuMfNmwXIkstPcFLpYkBuljfY
 enaqz1lwYPSGcOTIY9xIOBF68Tksl/auKOgNYuzJIw+Pe4naS88R6Ts+Vybqg8atvFiS4jtUIuuth
 LrxP3mFcdjGRTqZtRuTorVeixJSZXdOLV72Drx45IiVTufDEawi9+MYEmhyBqZFTavXEHH9GIXekQ
 zvfqaino88DRfA==;
Received: from [87.69.77.57] (port=4890 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ofgEu-0005Jr-Am; Tue, 04 Oct 2022 07:37:49 -0400
Date: Tue, 04 Oct 2022 14:37:27 +0300
Message-Id: <831qrnx1jc.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <017DAAA2-0383-4B47-855E-28348B2E9F06@HIDDEN> (message from
 Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Mon, 3 Oct 2022 21:48:10 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN> <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <878rlzj1zv.fsf@HIDDEN> <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN>
 <878rlzfylg.fsf@HIDDEN> <017DAAA2-0383-4B47-855E-28348B2E9F06@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, larsi@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Mattias Engdegård <mattias.engdegard@HIDDEN>
> Date: Mon, 3 Oct 2022 21:48:10 +0200
> Cc: 58168 <at> debbugs.gnu.org,
>  Eli Zaretskii <eliz@HIDDEN>
> 
> 1 okt. 2022 kl. 15.51 skrev Eli Zaretskii <eliz@HIDDEN>:
> 
> > I think the variable is a misnomer of sorts: the request back when it
> > was introduced to display hex where we usually display octal \nnn
> > escapes.  And the latter happens not only for raw bytes.
> 
> Fair enough. Maybe the documentation should reflect that, but I'm still holding out for a change to the C1 presentation in the long term, so...
> 
> I'm not going to pursue this little digression any further except that while looking at it I found a few inaccuracies and a likely bug in redisplay-testsuite.el. I'm attaching a patch which un-muddles the test and adds a display of unprintable Unicode chars such as C1 controls, in addition to raw bytes. I'd like to adorn the commit with the correct bug number so if you remember that of the original discussion that would be useful (I never found it very easy to search debbugs).

First I needed to fix fallout from making STRING_CHAR intolerant of
unibyte text, because redisplay-testsuite caused assertion violations
in string_char_and_length.  (Doesn't it abort for you? or do you not
build Emacs with --enable-checking?)  This was a regression in Emacs
28, sigh.

Looking at your patch, I don't think I understand this part:

> --- a/test/manual/redisplay-testsuite.el
> +++ b/test/manual/redisplay-testsuite.el
> @@ -305,7 +305,7 @@ test-redisplay-5-toggle
>    (let ((label (if display-raw-bytes-as-hex "\\x80" "\\200")))
>      (overlay-put test-redisplay-5a-expected-overlay 'display
>                   (propertize label 'face 'escape-glyph)))
> -  (let ((label (if display-raw-bytes-as-hex "\\x3fffc" "\\777774")))
> +  (let ((label (if display-raw-bytes-as-hex "\\xfc" "\\374")))
>      (overlay-put test-redisplay-5b-expected-overlay 'display
>                   (propertize label 'face 'escape-glyph))))
>  
> @@ -320,18 +320,36 @@ test-redisplay-5
>          (test-insert-overlay " " 'display "\200"))
>    (insert "\n\n")
>    (insert "  Expected: ")
> -  ;; This tests a large codepoint, to make sure the internal buffer we
> -  ;; use to produce the representation is large enough.
> -  (aset printable-chars #x3fffc nil)
>    (setq test-redisplay-5b-expected-overlay
>          (test-insert-overlay " " 'display
> -                             (propertize "\\777774" 'face 'escape-glyph)))
> +                             (propertize "\\374" 'face 'escape-glyph)))

I could understand why you'd want to _add_ the larger values, but why
replace?

As for the bug report which led to display-raw-bytes-as-hex (if that
what you meant) and its discussion, it's bug#27122.  (If you know what
code is in question, it is much easier to find the bug via "git
annotate", assuming the bug number was cited in the commit logs.)




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 4 Oct 2022 10:44:58 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Oct 04 06:44:58 2022
Received: from localhost ([127.0.0.1]:52577 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1offQ1-0001Uf-Tt
	for submit <at> debbugs.gnu.org; Tue, 04 Oct 2022 06:44:58 -0400
Received: from quimby.gnus.org ([95.216.78.240]:44352)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <larsi@HIDDEN>) id 1offQ0-0001UT-Sb
 for 58168 <at> debbugs.gnu.org; Tue, 04 Oct 2022 06:44:57 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org;
 s=20200322; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID
 :Date:References:In-Reply-To:Subject:Cc:To:From:Sender:Reply-To:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
 List-Post:List-Owner:List-Archive;
 bh=wkFKEP9v1AHlwK6wElDMLdk5WWpvGcY3ukV+RtnNjHc=; b=Fphyrl+K+uLMEUxqzyr+q7tvjj
 YfJDEfB91vYN+EG/QOi67bROkqu81ZvyIws+f93Z1Ffv6FKRzVwyvKFA7rJeHmscFedu/+6cM+TB5
 Kri9D9LfHLAxy/MTX7fyPvLy1ntZRKA4hJ7sjp3Yz4zkOfQGTExk3cP8ha4i9KT67RmA=;
Received: from [84.212.220.105] (helo=downe)
 by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.92) (envelope-from <larsi@HIDDEN>)
 id 1offPs-0002yU-Av; Tue, 04 Oct 2022 12:44:50 +0200
From: Lars Ingebrigtsen <larsi@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
In-Reply-To: <017DAAA2-0383-4B47-855E-28348B2E9F06@HIDDEN> ("Mattias
 =?utf-8?Q?Engdeg=C3=A5rd=22's?= message of "Mon, 3 Oct 2022 21:48:10
 +0200")
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN>
 <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <878rlzj1zv.fsf@HIDDEN>
 <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN>
 <878rlzfylg.fsf@HIDDEN>
 <017DAAA2-0383-4B47-855E-28348B2E9F06@HIDDEN>
Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAgMAAAAqbBEUAAAABGdBTUEAALGPC/xhBQAAACBj
 SFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAADFBMVEVzh4uuvrxBUFX/
 //8ut8HHAAAAAWJLR0QDEQxM8gAAAAd0SU1FB+YKBAopMkgdlnQAAAF9SURBVCjPPdJBa4MwFAfw
 tJhSPHWl2WEnHStUv4VlHc6TSp4MT9vYofRT2GJ28DjWnasQiO9TLia2gpAf/5eX8JQQWjAYniJd
 EQ2hlx9Q3sDaEVwA/w5YbgA0S7y54BZ+/qNmEiyivEQ8Q+oMgDSc4jmzCZStF8q3dGFwOAlXZimx
 cNhWXsuSih36aGyAE0j6V7unQP32WcoHrFAYOBYMCpXZMtcAbOtQ36VQOwPmnijkY0KTTtzK6OFL
 wsMIP3458qS3CF4/q2wbjNiHRz8cE7qfV/7UYsXiqG7vLO5ZOZdy0lrQsiPtY1/oqxGHlsT5fWxz
 A8Enu/NUlKabKJRXd1fkuD+hHtUAmSpnicKMiq0BvxYNs/jg6P00zJzDIo6X2TB4vdbf0PNyDeDE
 VQze3/8almBFEHs4+Pd4RlQW8g8FXhRRJlk0EpXF/qluxAWvyfO87oY9KoDNmm47ZRpsAL1qurFQ
 gN0yDlRosFJ+QoLhHP3vLI/r+FnxmPwDfgizGO14aqQAAAAldEVYdGRhdGU6Y3JlYXRlADIwMjIt
 MTAtMDRUMTA6NDE6NDkrMDA6MDBAXGpQAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDIyLTEwLTA0VDEw
 OjQxOjQ5KzAwOjAwMQHS7AAAAABJRU5ErkJggg==
X-Now-Playing: Propaganda's _A Secret Wish_: "p:Machinery"
Date: Tue, 04 Oct 2022 12:44:47 +0200
Message-ID: <874jwj50m8.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 @@CONTACT_ADDRESS@@ for details.
 
 Content preview:  Mattias Engdegård <mattias.engdegard@HIDDEN> writes: >
    I'm attaching a patch which un-muddles the > test and adds a display of unprintable
    Unicode chars such as C1 > controls, in addition to raw bytes. I'd like to
    adorn the commit with > the correct bug [...] 
 
 Content analysis details:   (-2.9 points, 5.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, Eli Zaretskii <eliz@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Mattias Engdeg=C3=A5rd <mattias.engdegard@HIDDEN> writes:

> I'm attaching a patch which un-muddles the
> test and adds a display of unprintable Unicode chars such as C1
> controls, in addition to raw bytes. I'd like to adorn the commit with
> the correct bug number so if you remember that of the original
> discussion that would be useful (I never found it very easy to search
> debbugs).

Me neither.  I think the context was something about...  uhm...  cutting
and pasting text displayed as \ef from a terminal, and that being
ambiguous?




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 4 Oct 2022 05:55:51 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Oct 04 01:55:51 2022
Received: from localhost ([127.0.0.1]:52157 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ofauE-0002Kq-Rt
	for submit <at> debbugs.gnu.org; Tue, 04 Oct 2022 01:55:51 -0400
Received: from eggs.gnu.org ([209.51.188.92]:57470)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1ofauD-0002Kd-J0
 for 58168 <at> debbugs.gnu.org; Tue, 04 Oct 2022 01:55:49 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:40912)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ofau8-0008Q3-9o; Tue, 04 Oct 2022 01:55:44 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=wt2XaKd2cndHBkBYgt+H0IUgcvHFXZPduq9+TTqedg4=; b=dNbSzZAT63n35jHp6+As
 KLXj2Oa52RdjkyCqQk+Sf37teMaCNl/rGBpYcYss/b726B7XF7V28u2JpmG3/UDYXVa5pTsSsDDbV
 D1hBG8Csd7zBj5hn6QYkFYGTrPvaXkTiRaNCaHa3uja+jFc/ePwtZTsSsYU3kBA+R4IJgixt+bBOR
 tKeh+fO0ryOQXiubsgUT0gdxD/CLq4RwPvq3ZDf0dL6WPlfbS0aFd7qc6jLAGAxe99YO2SBLjdSVX
 2cLN1/j2L0kWU5ucdYlKwJr2FzToulph6g4StHhlWTfDhAg/jfhKHu9O8cuu+Wj/m8noFCND6q43r
 vCPd7ujoIL3w3Q==;
Received: from [87.69.77.57] (port=3321 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1ofau6-0008Ou-S0; Tue, 04 Oct 2022 01:55:43 -0400
Date: Tue, 04 Oct 2022 08:55:34 +0300
Message-Id: <83wn9gw2sp.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <BC625893-642A-4B8B-9309-1DCC5E4594B3@HIDDEN> (message from
 Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Mon, 3 Oct 2022 21:48:14 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <83czbef6le.fsf@HIDDEN> <6CB805F6-89EE-4D7C-A398-F29698733A42@HIDDEN>
 <83h70oce4k.fsf@HIDDEN> <B56DE6FE-732D-432D-B2C2-1B54FC8472B1@HIDDEN>
 <83tu4mais1.fsf@HIDDEN> <BC625893-642A-4B8B-9309-1DCC5E4594B3@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Mattias Engdegård <mattias.engdegard@HIDDEN>
> Date: Mon, 3 Oct 2022 21:48:14 +0200
> Cc: 58168 <at> debbugs.gnu.org
> 
> 2 okt. 2022 kl. 07.36 skrev Eli Zaretskii <eliz@HIDDEN>:
> 
> >> Comparison between objects is not only useful when someone cares about their order, as in presenting a sorted list to the user. Often what is important is an ability to impose an order, preferably total, for use in building and searching data structures. I came across this bug when implementing a string set.
> > 
> > Always converting to multibyte handles this case, doesn't it?
> 
> I don't think it does -- string= treats raw bytes in unibyte and multibyte strings as distinct; converting to multibyte does not preserve (in)equality.

If the fact that string= says strings are not equal, but string-lessp
says they are equal, is what bothers you, we could document that
results of comparing unibyte and multibyte strings are unspecified, or
document explicitly that string= and string-lessp behave differently
in this case.

IOW, I see no reason to worry about 100% consistency here: the order
is _really_ undefined in these cases, and trying to make it defined
will not produce any tangible gains, IMNSHO.  It could very well
produce bugs and regressions, OTOH.  So it sounds like a net loss to
me, in practical terms.

> >> Actually I was talking about multibyte-multibyte comparisons.
> > 
> > Then why did you mention raw bytes? their multibyte representation
> > presents no performance problems
> 
> In a way they do -- the way raw bytes are represented (they start with C0 or C1) causes memcmp to sort them between U+007F and U+0080. If we accept that then comparisons are fast since memcmp will compare many character per data-dependent branch. The current code requires several data-dependent branches for each character.

Once again, slowing down string-lessp when raw-bytes are involved
shouldn't be a problem.  So, if memchr finds a C0 or C1 in a string,
fall back to a slower comparison.  memchr is fast enough to not slow
down the "usual" case.  Would that be a good solution?

Alternatively, we could introduce a new primitive which could assume
multibyte or plain-ASCII unibyte strings without checking, and then
code which is sure raw-bytes cannot happen, and needs to compare long
strings, could use that for speed.

> While we could probably bring down the comparison cost slightly by clever hand-coding, it's unlikely to be even nearly as fast as a memcmp and much messier. Since users are unlikely to care much about the ordering between raw bytes and something else (as long as there is an order), it would be a cheap way to improve performance while at the same time fixing the string< / string= mismatch.

The assumption that "users are unlikely to care" is a pure conjecture,
and we have no way of validating it.  So I don't want us to act on
such an assumption.

What about one of the alternatives above instead?

> > You can compare under the assumption that a unibyte string is
> > pure-ASCII until you bump into the first non-ASCII one.  If that
> > happens, abandon the comparison, convert the unibyte string to its
> > multibyte representation, and compare again.
> 
> I don't quite see how that would improve performance but may be missing something.

Then maybe I didn't understand the performance problems that you had
in mind.  Suppose you describe them in more detail, preferably with
examples?  E.g., are you saying that unibyte strings that are
pure-ASCII also cause performance problems?




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 3 Oct 2022 19:48:23 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Oct 03 15:48:23 2022
Received: from localhost ([127.0.0.1]:51762 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ofRQM-0001bO-Np
	for submit <at> debbugs.gnu.org; Mon, 03 Oct 2022 15:48:23 -0400
Received: from mail-lf1-f49.google.com ([209.85.167.49]:35420)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1ofRQL-0001b1-HG
 for 58168 <at> debbugs.gnu.org; Mon, 03 Oct 2022 15:48:22 -0400
Received: by mail-lf1-f49.google.com with SMTP id z4so18113052lft.2
 for <58168 <at> debbugs.gnu.org>; Mon, 03 Oct 2022 12:48:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject
 :date; bh=YS+G3ocWKVPaQrctHWiKh6OCpsDKjffkzwW+NKA2JYw=;
 b=J80zG99Jq2Bt3a3KhSHRQLdPVwwF011Q5LsHDE4TUO6HkjmMXA4HJBPw8n17Rf7Hom
 0971uDqlgWAoeiWJTFreLjm2J6NWpo5CcXOg7UGACZDpdVbuwrdlmT6daN/JJ+IgRVCN
 HCP1RWA5zA5JyEpYb7NSSChVqOuiluZEv/jnEMHDAhAWScNy5pfeY0CjjD3gxTRI76CD
 KRMkl9JhOTHlHedM9usAlv3E7WMQgnBtPg//Fucx0Y/n0OZK/RRegzMiVQGbyzwkmgFB
 iQp2ZFu+8z04Z1F9Xdgcdz6NpddkkLmO+S5ldnnxq1zMfrRC6HjgJrSjACC18/2OTmNb
 sktA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:x-gm-message-state
 :from:to:cc:subject:date;
 bh=YS+G3ocWKVPaQrctHWiKh6OCpsDKjffkzwW+NKA2JYw=;
 b=h41ThII3hQ5KYHp/kMa8fnsqIsxJUzhfeQuQsrs1kfZv3KQzRWE+xFwbJRfuXZdJWa
 DlqYu0EQtbpZvm6nMgkPTzfZYAxrO6ChoWsX67TTi1dIB2xIkHwHLSolg14U6bIH0ZxD
 etKWDEXyahlolygH1w2hOk7fLc/idHDbr4dwo9HWXIP9H7csUQF6bkIZkyc7RZDf/h3n
 YBokB1gW9JyvMTsKINB6lRm9YlMB0Jzosn3P7iMaNSA8d7RwfkW6QHIzyKQVpf0ZBWP5
 HolAk3bnTxotCbtEYqEws/qwoQw/mWAnWzTuB5Xhz2xwN3xSevNQaVZDyr/GE+QTOzaw
 x2DQ==
X-Gm-Message-State: ACrzQf3FlGwL7x2xXcNHSf9xeFgS37sDr0wnilElswBmRbP6+RCHTzQ7
 DnFVVraQNyAbfBxEQSZxWHY=
X-Google-Smtp-Source: AMsMyM45iRm37xJgile7jzoQryolp11GSgqGU1LnZDCT6DqMOkSn3lws52XnrgoF9kUaqTk0hHxeuQ==
X-Received: by 2002:ac2:4a78:0:b0:4a2:2974:c86d with SMTP id
 q24-20020ac24a78000000b004a22974c86dmr3803458lfp.514.1664826495601; 
 Mon, 03 Oct 2022 12:48:15 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 o15-20020a05651c050f00b0026de7597bffsm116993ljp.10.2022.10.03.12.48.14
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 03 Oct 2022 12:48:14 -0700 (PDT)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <83tu4mais1.fsf@HIDDEN>
Date: Mon, 3 Oct 2022 21:48:14 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <BC625893-642A-4B8B-9309-1DCC5E4594B3@HIDDEN>
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <83czbef6le.fsf@HIDDEN> <6CB805F6-89EE-4D7C-A398-F29698733A42@HIDDEN>
 <83h70oce4k.fsf@HIDDEN> <B56DE6FE-732D-432D-B2C2-1B54FC8472B1@HIDDEN>
 <83tu4mais1.fsf@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

2 okt. 2022 kl. 07.36 skrev Eli Zaretskii <eliz@HIDDEN>:

>> Comparison between objects is not only useful when someone cares =
about their order, as in presenting a sorted list to the user. Often =
what is important is an ability to impose an order, preferably total, =
for use in building and searching data structures. I came across this =
bug when implementing a string set.
>=20
> Always converting to multibyte handles this case, doesn't it?

I don't think it does -- string=3D treats raw bytes in unibyte and =
multibyte strings as distinct; converting to multibyte does not preserve =
(in)equality.

>> Actually I was talking about multibyte-multibyte comparisons.
>=20
> Then why did you mention raw bytes? their multibyte representation
> presents no performance problems

In a way they do -- the way raw bytes are represented (they start with =
C0 or C1) causes memcmp to sort them between U+007F and U+0080. If we =
accept that then comparisons are fast since memcmp will compare many =
character per data-dependent branch. The current code requires several =
data-dependent branches for each character.

While we could probably bring down the comparison cost slightly by =
clever hand-coding, it's unlikely to be even nearly as fast as a memcmp =
and much messier. Since users are unlikely to care much about the =
ordering between raw bytes and something else (as long as there is an =
order), it would be a cheap way to improve performance while at the same =
time fixing the string< / string=3D mismatch.

> You can compare under the assumption that a unibyte string is
> pure-ASCII until you bump into the first non-ASCII one.  If that
> happens, abandon the comparison, convert the unibyte string to its
> multibyte representation, and compare again.

I don't quite see how that would improve performance but may be missing =
something.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 3 Oct 2022 19:48:20 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Oct 03 15:48:20 2022
Received: from localhost ([127.0.0.1]:51759 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ofRQK-0001bA-B7
	for submit <at> debbugs.gnu.org; Mon, 03 Oct 2022 15:48:20 -0400
Received: from mail-lf1-f44.google.com ([209.85.167.44]:36661)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1ofRQI-0001ax-Fz
 for 58168 <at> debbugs.gnu.org; Mon, 03 Oct 2022 15:48:19 -0400
Received: by mail-lf1-f44.google.com with SMTP id bu25so18103860lfb.3
 for <58168 <at> debbugs.gnu.org>; Mon, 03 Oct 2022 12:48:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=references:to:cc:in-reply-to:date:subject:mime-version:message-id
 :from:sender:from:to:cc:subject:date;
 bh=+QfvMHPkz7KgvefOzTxOgiRMMH9lGXewvks6VHiyHac=;
 b=h56QuqQ/x0JSnOQCKnIhKLTVuh7iRxDBryqQgvKWQcebtpvaJabONTZB9+MnG5S1C9
 c8O/r6mbcsdJIAtqCFXNdS0LDMpZmSvE/VUarYKQ3+kn4M8Mgnhbagfrbm8r1fM4gml6
 NFxlZ4Ioaa04T7mXN2MHLdy+eHbPLwO0/weEjm4RMWf2oCE6QTcf/NbYqFvAQ1UeE49G
 mBwPspd1xfPaE6evJWGqdrc2NCflWcZkt3fS/xsi/mUDm57pWNThIPvOBVG7G0hXxlAC
 s9gprpg8y66YjNN61Vb61adHKtbBvbadHt073RtRiMPSEYRUbzIh2jyoSUztKONRSjho
 Enng==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=references:to:cc:in-reply-to:date:subject:mime-version:message-id
 :from:sender:x-gm-message-state:from:to:cc:subject:date;
 bh=+QfvMHPkz7KgvefOzTxOgiRMMH9lGXewvks6VHiyHac=;
 b=GesnTQtdO3O6/ABaiMHySHTcrujYv1qwYty7+Q/lyNE54Ta0/9iYWNtDptcLiwJpzW
 pBUFEtxvwPgDOMrH+HiG/z79RuwkPiPOoZNspVBvspWHJknHVUAHRq4SAMt43XfdRg/1
 bgarWMBJIgjE5xGy945T+AOj9V2PKcC9xQhU60D0I3nbiHOIg9IL3mXKmo1l6H85NaUv
 eqjFZuqSm9ISyxHo4TpgOyotJ8ARLP1zC1nYecifGIyMoCO0GtEeo97BHT4+ywNzk/mT
 bNoVEue/FCKi4IguEPzU/NmWtGrgoYI5eWiLWMZXNc0OuRWKrFmdbeu4wBdHG4tmrKUL
 YlrA==
X-Gm-Message-State: ACrzQf2fh6EdKGdKx/7jxWr44Zp8kxJ5OSBQMhGHhANZE9ivUG+v7KXp
 Yyf0fOgqupH2Qh51PQ5DyOY=
X-Google-Smtp-Source: AMsMyM6pzNp3aIZpiJnBSXC3olu0YmstozJBg/PqUMyvdzDqkAoaR7ufrQSJD8xtU0hKRMpcZogU/A==
X-Received: by 2002:a05:6512:230f:b0:499:dcd:2fd2 with SMTP id
 o15-20020a056512230f00b004990dcd2fd2mr8605302lfu.677.1664826492119; 
 Mon, 03 Oct 2022 12:48:12 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 o15-20020a05651c050f00b0026de7597bffsm116993ljp.10.2022.10.03.12.48.11
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 03 Oct 2022 12:48:11 -0700 (PDT)
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
Message-Id: <017DAAA2-0383-4B47-855E-28348B2E9F06@HIDDEN>
Content-Type: multipart/mixed;
 boundary="Apple-Mail=_71F96A86-5E70-4A3D-805B-3E204C4443BC"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
Date: Mon, 3 Oct 2022 21:48:10 +0200
In-Reply-To: <878rlzfylg.fsf@HIDDEN>
To: Lars Ingebrigtsen <larsi@HIDDEN>
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN> <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <878rlzj1zv.fsf@HIDDEN> <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN>
 <878rlzfylg.fsf@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, Eli Zaretskii <eliz@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)


--Apple-Mail=_71F96A86-5E70-4A3D-805B-3E204C4443BC
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

1 okt. 2022 kl. 15.43 skrev Lars Ingebrigtsen <larsi@HIDDEN>:

> There was a very long thread about changing this output in a bug =
report
> somewhere, and we decided not to, because all the alternatives were
> worse than what we have.

I have no wish to reopen old wounds (and it sounds as if the debate =
isn't one that you are keen to revisit), but I'm a bit surprised -- it =
all arises because of old code that still treats raw bytes as Latin-1, =
and I know that neither of you are very fond of that either.

1 okt. 2022 kl. 15.51 skrev Eli Zaretskii <eliz@HIDDEN>:

> I think the variable is a misnomer of sorts: the request back when it
> was introduced to display hex where we usually display octal \nnn
> escapes.  And the latter happens not only for raw bytes.

Fair enough. Maybe the documentation should reflect that, but I'm still =
holding out for a change to the C1 presentation in the long term, so...

I'm not going to pursue this little digression any further except that =
while looking at it I found a few inaccuracies and a likely bug in =
redisplay-testsuite.el. I'm attaching a patch which un-muddles the test =
and adds a display of unprintable Unicode chars such as C1 controls, in =
addition to raw bytes. I'd like to adorn the commit with the correct bug =
number so if you remember that of the original discussion that would be =
useful (I never found it very easy to search debbugs).


--Apple-Mail=_71F96A86-5E70-4A3D-805B-3E204C4443BC
Content-Disposition: attachment;
	filename=redisplay-testsuite.diff
Content-Type: application/octet-stream;
	x-unix-mode=0644;
	name="redisplay-testsuite.diff"
Content-Transfer-Encoding: 7bit

diff --git a/test/manual/redisplay-testsuite.el b/test/manual/redisplay-testsuite.el
index 01b0a895a4..5495146b87 100644
--- a/test/manual/redisplay-testsuite.el
+++ b/test/manual/redisplay-testsuite.el
@@ -305,7 +305,7 @@ test-redisplay-5-toggle
   (let ((label (if display-raw-bytes-as-hex "\\x80" "\\200")))
     (overlay-put test-redisplay-5a-expected-overlay 'display
                  (propertize label 'face 'escape-glyph)))
-  (let ((label (if display-raw-bytes-as-hex "\\x3fffc" "\\777774")))
+  (let ((label (if display-raw-bytes-as-hex "\\xfc" "\\374")))
     (overlay-put test-redisplay-5b-expected-overlay 'display
                  (propertize label 'face 'escape-glyph))))
 
@@ -320,18 +320,36 @@ test-redisplay-5
         (test-insert-overlay " " 'display "\200"))
   (insert "\n\n")
   (insert "  Expected: ")
-  ;; This tests a large codepoint, to make sure the internal buffer we
-  ;; use to produce the representation is large enough.
-  (aset printable-chars #x3fffc nil)
   (setq test-redisplay-5b-expected-overlay
         (test-insert-overlay " " 'display
-                             (propertize "\\777774" 'face 'escape-glyph)))
+                             (propertize "\\374" 'face 'escape-glyph)))
   (insert "\n    Result: ")
   (setq test-redisplay-5b-result-overlay
-        (test-insert-overlay " " 'display (char-to-string #x3fffc)))
+        (test-insert-overlay " " 'display (char-to-string #x3ffffc)))
+  (insert "\n\n")
+  (insert-button "Toggle between octal and hex display for raw bytes"
+                 'action 'test-redisplay-5-toggle)
+  (insert "\n\n"))
+
+(defun test-redisplay-6 ()
+  (insert "Test 6: Display of unprintable Unicode chars:\n\n")
+  (insert "  Expected: ")
+  (test-insert-overlay " " 'display
+                       (propertize "\\200" 'face 'escape-glyph))
+  (insert "  (representing U+0100)")
+  (insert "\n    Result: ")
+  (test-insert-overlay " " 'display "\u0080")
   (insert "\n\n")
-  (insert-button "Toggle between octal and hex display"
-                 'action 'test-redisplay-5-toggle))
+  ;; This tests a large codepoint, to make sure the internal buffer we
+  ;; use to produce the representation is large enough.
+  (insert "  Expected: ")
+  (aset printable-chars #x10abcd nil)
+  (test-insert-overlay " " 'display
+                       (propertize "\\4125715" 'face 'escape-glyph))
+  (insert "  (representing U+0010ABCD)")
+  (insert "\n    Result: ")
+  (test-insert-overlay " " 'display "\U0010ABCD")
+  (insert "\n\n"))
 
 (defun test-redisplay ()
   (interactive)
@@ -349,6 +367,7 @@ test-redisplay
     (test-redisplay-3)
     (test-redisplay-4)
     (test-redisplay-5)
+    (test-redisplay-6)
     (goto-char (point-min))))
 
 ;;; redisplay-testsuite.el ends here

--Apple-Mail=_71F96A86-5E70-4A3D-805B-3E204C4443BC--




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 2 Oct 2022 05:37:05 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Oct 02 01:37:05 2022
Received: from localhost ([127.0.0.1]:46087 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oerez-0006bM-0P
	for submit <at> debbugs.gnu.org; Sun, 02 Oct 2022 01:37:05 -0400
Received: from eggs.gnu.org ([209.51.188.92]:48266)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1oerew-0006as-Ea
 for 58168 <at> debbugs.gnu.org; Sun, 02 Oct 2022 01:37:03 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:52750)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oerer-0007Kv-5d; Sun, 02 Oct 2022 01:36:57 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=IrJkYNYEzLpDptghp5C90nCDV5y0ZK9grcgeuoSyrCU=; b=Kacmbtmsu+97rOE1raRN
 qnVnt8wIAqsbztagwOqEP6830kbFZkslAXmfMrQKMM0lOi07O0ajpCISnaMYn8plO0OALXmlxq6Xk
 h4myK1lRwcit79P5sgIV06YYnv1JyyLhDu7lHqa+xQZa38zR7rgRuK7r2EtayQIrfOvrnPyz4W054
 hlx7YYTnhuyTV53GJGXQ6BxUDX/xSXi8fOXYH8KkWaD3HVrheTHdZHyRHwG0ONA5vl3ZWDlr1K2qZ
 R3FlwWLlLft/0ZREH9X0UKtI59Ik7TXXAZSmiIiuqh2QzRSZ2/72yv5UefknawZhX5HSUso9fJ3+n
 lepYrIEZWokUQg==;
Received: from [87.69.77.57] (port=2226 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oerep-0005JQ-SN; Sun, 02 Oct 2022 01:36:56 -0400
Date: Sun, 02 Oct 2022 08:36:46 +0300
Message-Id: <83tu4mais1.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <B56DE6FE-732D-432D-B2C2-1B54FC8472B1@HIDDEN> (message from
 Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Sat, 1 Oct 2022 21:57:45 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <83czbef6le.fsf@HIDDEN> <6CB805F6-89EE-4D7C-A398-F29698733A42@HIDDEN>
 <83h70oce4k.fsf@HIDDEN> <B56DE6FE-732D-432D-B2C2-1B54FC8472B1@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Mattias Engdegård <mattias.engdegard@HIDDEN>
> Date: Sat, 1 Oct 2022 21:57:45 +0200
> Cc: 58168 <at> debbugs.gnu.org
> 
> 1 okt. 2022 kl. 07.22 skrev Eli Zaretskii <eliz@HIDDEN>:
> 
> > It depends on the use case, but in general I see no problem with
> > signaling errors when we cannot produce reasonably correct results.
> > For example, string-to-unibyte does signal an error in some cases.
> 
> That's fine because that function is documented to do so and always has, but making previously possible comparisons raise errors shouldn't be done lightly.

I didn't say "lightly", nor do I think so.  We need to discuss
specific use cases.

An alternative is to always convert unibyte non-ASCII strings to their
multibyte representation before comparing.

> Comparison between objects is not only useful when someone cares about their order, as in presenting a sorted list to the user. Often what is important is an ability to impose an order, preferably total, for use in building and searching data structures. I came across this bug when implementing a string set.

Always converting to multibyte handles this case, doesn't it?

> >> It's also a matter of performance -- string< has been improved recently but currently we compare text in Latin and Swahili much faster than French and Arabic; it would be nice to close that gap. UTF-8 is designed so that comparing strings by scalar values can be done byte-wise, but the way we encode raw bytes make them sort right between ASCII and Latin-1. Given that the specific order doesn't matter much, we could just run with that.
> > 
> > I see no reason to make comparison of unibyte and multibyte strings
> > perform better.
> 
> Actually I was talking about multibyte-multibyte comparisons.

Then why did you mention raw bytes? their multibyte representation
presents no performance problems, AFAIU.

> You were probably thinking about comparisons between unibyte strings that contain raw bytes and multibyte strings, and those are indeed not very performance-sensitive. However there is no way to detect whether a unibyte string contains non-ASCII chars without looking at every byte, and comparing unibyte ASCII with multibyte is definitely of interest. Strings are still unibyte by default.

You can compare under the assumption that a unibyte string is
pure-ASCII until you bump into the first non-ASCII one.  If that
happens, abandon the comparison, convert the unibyte string to its
multibyte representation, and compare again.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 1 Oct 2022 19:57:55 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Oct 01 15:57:55 2022
Received: from localhost ([127.0.0.1]:45737 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oeicV-0000aT-4N
	for submit <at> debbugs.gnu.org; Sat, 01 Oct 2022 15:57:55 -0400
Received: from mail-lj1-f173.google.com ([209.85.208.173]:34502)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1oeicT-0000aF-1T
 for 58168 <at> debbugs.gnu.org; Sat, 01 Oct 2022 15:57:53 -0400
Received: by mail-lj1-f173.google.com with SMTP id bs18so6481103ljb.1
 for <58168 <at> debbugs.gnu.org>; Sat, 01 Oct 2022 12:57:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject
 :date; bh=zI6vJUvs0K2UptNH6QvI3pe9mrg0gcYFWfTnVLyxcYw=;
 b=LlMWSCfis4ycINfkCoXrAgD1KhGTGFqTPpzFF1gHM0nRueDaXNMi3jPyto0gpOftu/
 3bSzLIQbTOkzm2Ot5Sv7NILgC6WzYSfKGqYzpBQUjkvWhY7wXhlqpbYJkqd97JsP8T/w
 6RWOk/p2EV6yAZSJ3C1VvhvymXazhWqKlfPuzEQtd2WJgNtAOZGdHO0dTbrZp6JAQACK
 T+kJWOHJwWFLBkZcnKUqORNh4CsBMIel92yDVfv12RSgIRpxMN5IF16h8M/WnAuZP13V
 76yGvR8lbaO8toHM8jYZZ8CSA3GPBiM53ddQZqgB8EM3QzVsRzkJQEFdJMcwXFa6z3iD
 K5vg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:x-gm-message-state
 :from:to:cc:subject:date;
 bh=zI6vJUvs0K2UptNH6QvI3pe9mrg0gcYFWfTnVLyxcYw=;
 b=W4mzHsoC/rcmA11b2e/DUGncfvkAGkxdhuqwCjRFr63Kr4tEULS7EaVZTRPDFcwtDF
 464mjpnuysWyeYnzJPvBgI5SUW24Tk8ZvGybz3SE6l1jYc7HB48n9GEm1r1OnKmEzwCa
 XyhgYrvEbA4Zy7Mncw77hZYEebwvNhCR0wwupvPQ/GHsQNsKRNOceec6GTAVVXlVWun+
 ZVVzgwa75mQPG8yMMgmQ+Hdm0PCaHzQNhMswPmgIqVnqDbFXpBZk0u1svbvJzXA0wcRM
 BywmWQIJO6BuTyjiUfFvt/l1xCKrGVmlUsVJNF39E1Mt3sa7OCG6/V6vsU2iExILeKPl
 9twA==
X-Gm-Message-State: ACrzQf3vYcu8Aqf15YuFqey4L6C3lgxIjAR3CI2AMCzVS/NGqpm7JETv
 cGShwjvPqn0dnM9JG1jPQ0E=
X-Google-Smtp-Source: AMsMyM5hoCkld9Nu+nbsNKbvwiTVShVoTFJy3+p3DljljrYqPQ65IwN1rRZx7Ifw5VEuH8F9kz3mMg==
X-Received: by 2002:a2e:3211:0:b0:261:c5c8:3403 with SMTP id
 y17-20020a2e3211000000b00261c5c83403mr4684364ljy.86.1664654266724; 
 Sat, 01 Oct 2022 12:57:46 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 bi9-20020a0565120e8900b0048b08e25979sm842407lfb.199.2022.10.01.12.57.45
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Sat, 01 Oct 2022 12:57:46 -0700 (PDT)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <83h70oce4k.fsf@HIDDEN>
Date: Sat, 1 Oct 2022 21:57:45 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <B56DE6FE-732D-432D-B2C2-1B54FC8472B1@HIDDEN>
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <83czbef6le.fsf@HIDDEN> <6CB805F6-89EE-4D7C-A398-F29698733A42@HIDDEN>
 <83h70oce4k.fsf@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

1 okt. 2022 kl. 07.22 skrev Eli Zaretskii <eliz@HIDDEN>:

> It depends on the use case, but in general I see no problem with
> signaling errors when we cannot produce reasonably correct results.
> For example, string-to-unibyte does signal an error in some cases.

That's fine because that function is documented to do so and always has, =
but making previously possible comparisons raise errors shouldn't be =
done lightly.

Comparison between objects is not only useful when someone cares about =
their order, as in presenting a sorted list to the user. Often what is =
important is an ability to impose an order, preferably total, for use in =
building and searching data structures. I came across this bug when =
implementing a string set.

>> It's also a matter of performance -- string< has been improved =
recently but currently we compare text in Latin and Swahili much faster =
than French and Arabic; it would be nice to close that gap. UTF-8 is =
designed so that comparing strings by scalar values can be done =
byte-wise, but the way we encode raw bytes make them sort right between =
ASCII and Latin-1. Given that the specific order doesn't matter much, we =
could just run with that.
>=20
> I see no reason to make comparison of unibyte and multibyte strings
> perform better.

Actually I was talking about multibyte-multibyte comparisons.

You were probably thinking about comparisons between unibyte strings =
that contain raw bytes and multibyte strings, and those are indeed not =
very performance-sensitive. However there is no way to detect whether a =
unibyte string contains non-ASCII chars without looking at every byte, =
and comparing unibyte ASCII with multibyte is definitely of interest. =
Strings are still unibyte by default.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 1 Oct 2022 13:51:30 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Oct 01 09:51:30 2022
Received: from localhost ([127.0.0.1]:44034 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oectu-0005qF-Dr
	for submit <at> debbugs.gnu.org; Sat, 01 Oct 2022 09:51:30 -0400
Received: from eggs.gnu.org ([209.51.188.92]:57274)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1oects-0005q2-9Q
 for 58168 <at> debbugs.gnu.org; Sat, 01 Oct 2022 09:51:28 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:54978)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oectn-0001M8-2d; Sat, 01 Oct 2022 09:51:23 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=8ie0vxKHLyeYXLGJeerXXNGU1frA7VpbEIu1MhN60i0=; b=B1bhwS93YR7tNqps4u7C
 8jb2U/SYLiwO/MsEbztkua08tg5tF9Ov82jBrfcxpwvJE0ADv2gSvfodLAc0s7xrZa3Q6EUr4aah/
 ie2Y4nc0+YlnwsEVkcngr60c9m1TfIXp6e58DFdxV4ian/ym2jl3Y6d5wffB6r0fB7qtZi8dizODV
 0oBK3iPkv36ujdowz89jQwhbBLY7uWeeazyhgV45yfZfjqi2d+l/imlcRO1qNPQvi3qcV+K0y2qdx
 eek+IgQ1dVlDhJux+ILtJ6vwmdp3dXScIdWIx9OQQ25EUKkYpQT/4i64gNW4DLcHWmQ1d6pozNuw1
 MOoArzsOlH4yaA==;
Received: from [87.69.77.57] (port=4176 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oectm-00089o-HR; Sat, 01 Oct 2022 09:51:22 -0400
Date: Sat, 01 Oct 2022 16:51:11 +0300
Message-Id: <83ill3bqk0.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN> (message from
 Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Sat, 1 Oct 2022 15:37:25 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN> <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <878rlzj1zv.fsf@HIDDEN> <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, larsi@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Mattias Engdegård <mattias.engdegard@HIDDEN>
> Date: Sat, 1 Oct 2022 15:37:25 +0200
> Cc: 58168 <at> debbugs.gnu.org,
>  Eli Zaretskii <eliz@HIDDEN>
> 
> > Funnily enough, the latter displays in a different way for me, which may
> > or may not be a bug:
> > 
> > This is with `display-raw-bytes-as-hex' t.
> 
> You are right, that is completely broken -- display-raw-bytes-as-hex shouldn't affect the display of C1 controls.

I think the variable is a misnomer of sorts: the request back when it
was introduced to display hex where we usually display octal \nnn
escapes.  And the latter happens not only for raw bytes.

> It seems to be a relic from the pre-Unicode days of Emacs: the code responsible muddles the display of raw bytes and unicode controls.

No, I think we decided to keep the display of C1 characters as octal
escapes.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 1 Oct 2022 13:44:08 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Oct 01 09:44:08 2022
Received: from localhost ([127.0.0.1]:44007 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oecml-0005bT-Qm
	for submit <at> debbugs.gnu.org; Sat, 01 Oct 2022 09:44:08 -0400
Received: from quimby.gnus.org ([95.216.78.240]:39318)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <larsi@HIDDEN>) id 1oecmk-0005b2-Rx
 for 58168 <at> debbugs.gnu.org; Sat, 01 Oct 2022 09:44:07 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org;
 s=20200322; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID
 :Date:References:In-Reply-To:Subject:Cc:To:From:Sender:Reply-To:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
 List-Post:List-Owner:List-Archive;
 bh=plo9ZQRE6cfR8A4ip0wNvdo/tQEmCaoeorTB885clfw=; b=TEAV/oldWUHm4ti5sGhT70paob
 xr+A1i4rhyl3Wg+M3AU7MAIyyPbmdWq45VG5wf7n+ZiDq76ExJndRh2YaDVkCWSu5jcjO39JEwNjp
 8eR18/DQWJAqa9xEWwUz9AjA3jNpv2aHpfOGI+zfTVcNFPRv7TPXA2khKQ5vFkVv4WTY=;
Received: from [84.212.220.105] (helo=downe)
 by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.92) (envelope-from <larsi@HIDDEN>)
 id 1oecmc-0004aF-9D; Sat, 01 Oct 2022 15:44:00 +0200
From: Lars Ingebrigtsen <larsi@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
In-Reply-To: <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN> ("Mattias
 =?utf-8?Q?Engdeg=C3=A5rd=22's?= message of "Sat, 1 Oct 2022 15:37:25
 +0200")
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN>
 <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <878rlzj1zv.fsf@HIDDEN>
 <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN>
Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwAgMAAAAqbBEUAAAABGdBTUEAALGPC/xhBQAAACBj
 SFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAADFBMVEVCODWPWVXVz7//
 //8bs5b3AAAAAWJLR0QDEQxM8gAAAAd0SU1FB+YKAQ0qMyamE5YAAAGOSURBVCjPRdLBauMwEAbg
 XyY2rU5msaHJaQ89FD+FtCSh7MkNUg4+hdLC1k+hhhDYnNxSFzYnH5yA5yk7kklXhxEfM0KjsRGL
 vXA5gLh+4dhqxxF5/cFxIwKiY8ZxMkJUZdhc5KaYVhmotevX9/4Of6RHRVvqb+IXOQFB6T01maSD
 wtJ0eksOBR0ctFF6W9x7NPgVT/R+dqWSgbFE/rrHtH+kQ4ejyWkHMTTPvz2sbypykB0+OcPtRPNU
 NqCn5OwQz6Ty4OUgzIihJiegJKTChziRuwauIec4aVc0CABW3JYHl6WAEjQwkgAkdFL8XPDjqjKh
 M4RVkAbHNK4/lbAl5A5/kdT/AGsgHdpc/hi0yhm+nYrozDOyVQNrfXfUrok2YbDts9WMEnxFRMcG
 1QWOQ+ERykSGmkiNHyNq/oOE8mWMmKFuJXmUKVake485YsUZ3XlMxjOqmzGyEe5nccGK3m/qAAOs
 27dbn0nDRNvd05I3jJibxTf8cPiQ9agX1j4weh78wv8XFC4dFnRZ5gu0PLNQPKOFAwAAACV0RVh0
 ZGF0ZTpjcmVhdGUAMjAyMi0xMC0wMVQxMzo0Mjo1MSswMDowMJO9LnIAAAAldEVYdGRhdGU6bW9k
 aWZ5ADIwMjItMTAtMDFUMTM6NDI6NTErMDA6MDDi4JbOAAAAAElFTkSuQmCC
X-Now-Playing: Strawberry Switchblade's _Make More Noise (4)_: "Go Away"
Date: Sat, 01 Oct 2022 15:43:55 +0200
Message-ID: <878rlzfylg.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 @@CONTACT_ADDRESS@@ for details.
 
 Content preview:  Mattias Engdegård <mattias.engdegard@HIDDEN> writes: >
    depending on display-raw-bytes-as-hex. With the patch, we get > > € € €C1:
    \u0080 raw: \200. > or > C1: \u0080 raw: \x80. > > which should satisfy everyone.
    What about it? 
 
 Content analysis details:   (-2.9 points, 5.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, Eli Zaretskii <eliz@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Mattias Engdeg=C3=A5rd <mattias.engdegard@HIDDEN> writes:

> depending on display-raw-bytes-as-hex. With the patch, we get
>
> =C2=80 =C2=80  =C2=80C1: \u0080 raw: \200.
> or
>    C1: \u0080 raw: \x80.
>
> which should satisfy everyone. What about it?

There was a very long thread about changing this output in a bug report
somewhere, and we decided not to, because all the alternatives were
worse than what we have.

*handwaves at bug tracker*




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 1 Oct 2022 13:37:35 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Oct 01 09:37:35 2022
Received: from localhost ([127.0.0.1]:43993 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oecgR-0005Qh-Bz
	for submit <at> debbugs.gnu.org; Sat, 01 Oct 2022 09:37:35 -0400
Received: from mail-lf1-f49.google.com ([209.85.167.49]:33472)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1oecgP-0005QR-NJ
 for 58168 <at> debbugs.gnu.org; Sat, 01 Oct 2022 09:37:34 -0400
Received: by mail-lf1-f49.google.com with SMTP id d42so10781663lfv.0
 for <58168 <at> debbugs.gnu.org>; Sat, 01 Oct 2022 06:37:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=references:to:cc:in-reply-to:date:subject:mime-version:message-id
 :from:sender:from:to:cc:subject:date;
 bh=YxQy48H3j6v9FqLEl+lmYX7+PPFSt4w15rOI/aXZpq8=;
 b=RtWKn8np6mwM1p12Zsdo8UKPgNTFA8gYTffuTlwoNUF5GfINHkhFHVwSTS8j2PSSEs
 hesJTkzJ66bE/Io3+gdX8mZNQmhLFN++tOAnhVJFgrNcGvJJL1JNrDQWvYkdbbV2AbCp
 20otuZmn6jgiNTB4O5rAM0wUG3z0Y2e9iIeRXdz4m8K8QKCadYt4RQ2xnBpnIqXRkz0w
 BOvas9c6xOxpK2gHrxyLMepkV3JTGD3mc72ihqiJ+DnGR9H7RbPm27ufZD2KF7T6SSa8
 dkuVh0nHPpgEjgpO+s6fTyTcPwys/0bpGBP0DUA7DGnhCjk2wrAAIZTAaIc1kthQFJ3v
 L6hQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=references:to:cc:in-reply-to:date:subject:mime-version:message-id
 :from:sender:x-gm-message-state:from:to:cc:subject:date;
 bh=YxQy48H3j6v9FqLEl+lmYX7+PPFSt4w15rOI/aXZpq8=;
 b=UBA8YIC8BePtiLFUtDxPucSm3QbniPV9lu7Z3c3pNpPA1T6phUlqtuoDGubHseddAj
 r3yZX39XM1xE17wFWrDU/2cGLNzH6SuxsodPO2GzaeR2dtAPuEK6UkeNLxKuUBlD3g7z
 7hhtQI72eW4Qhq7HU7edlNiDLzdbsWai1V9u7HLpvizW0OMX4OwhpQjr8NtJRGQuCQEq
 //VixaHLd3bGR71EPIsMdBvG1uppOHxfVuWZQvX4ZX2aJmW+j6T0pT39pbvxYkptZnsE
 mr5B9AEIHFCLVAaCnxypsWAj+nhVhrIjpE9MiF5AlveEkiTyDI7++6rc5uu1fezzMlp2
 oUQA==
X-Gm-Message-State: ACrzQf23QjPg31VUpTWW2d8I00UiJJ60AmUzFW8qELoz6IY4TP5zhXKc
 gqC0c4Ik828oWJCAgJNrcOY=
X-Google-Smtp-Source: AMsMyM7gZP2+eo7ZWgCin+ax2uPRXStY4YJbmzmrRK2QhQSIYm2iWy0lFvuEVMK+R4tIivv772roxA==
X-Received: by 2002:ac2:548a:0:b0:4a2:2b8f:7990 with SMTP id
 t10-20020ac2548a000000b004a22b8f7990mr423031lfk.402.1664631447464; 
 Sat, 01 Oct 2022 06:37:27 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 z13-20020a056512308d00b00494a1b242dasm772427lfd.14.2022.10.01.06.37.26
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Sat, 01 Oct 2022 06:37:26 -0700 (PDT)
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
Message-Id: <E3917799-028F-46CF-BD7B-060CEEDE37BD@HIDDEN>
Content-Type: multipart/mixed;
 boundary="Apple-Mail=_A3D1196E-9913-4565-A797-EED953C94F96"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
Date: Sat, 1 Oct 2022 15:37:25 +0200
In-Reply-To: <878rlzj1zv.fsf@HIDDEN>
To: Lars Ingebrigtsen <larsi@HIDDEN>
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN> <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <878rlzj1zv.fsf@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, Eli Zaretskii <eliz@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)


--Apple-Mail=_A3D1196E-9913-4565-A797-EED953C94F96
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

1 okt. 2022 kl. 12.02 skrev Lars Ingebrigtsen <larsi@HIDDEN>:

> Funnily enough, the latter displays in a different way for me, which =
may
> or may not be a bug:
>=20
> This is with `display-raw-bytes-as-hex' t.

You are right, that is completely broken -- display-raw-bytes-as-hex =
shouldn't affect the display of C1 controls.
Whether (string 128) displays "\200" or "\x80", however tarted up in a =
fancy face, it's still a lie. Only something like "\u0080" would =
actually be correct.

It seems to be a relic from the pre-Unicode days of Emacs: the code =
responsible muddles the display of raw bytes and unicode controls.
The attached patch untangles the two somewhat and lets =
display-raw-bytes-as-hex do what its name and documentation suggest, =
while using a non-confusing display for C1 controls.

The command

  (insert "C1: " (string 128) " raw: " (unibyte-string 128) ".\n")

currently displays

   C1: \200 raw: \200.
or
   C1: \x80 raw: \x80.

depending on display-raw-bytes-as-hex. With the patch, we get

=C2=80 =C2=80  =C2=80C1: \u0080 raw: \200.
or
   C1: \u0080 raw: \x80.

which should satisfy everyone. What about it?


--Apple-Mail=_A3D1196E-9913-4565-A797-EED953C94F96
Content-Disposition: attachment;
	filename=unicode-escape-display.diff
Content-Type: application/octet-stream;
	x-unix-mode=0644;
	name="unicode-escape-display.diff"
Content-Transfer-Encoding: 7bit

diff --git a/src/xdisp.c b/src/xdisp.c
index 55e74a3603..fa4fc2319e 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -8179,12 +8179,20 @@ get_next_display_element (struct it *it)
 		char str[10];
 		int len, i;
 
+		const char *format_string;
 		if (CHAR_BYTE8_P (c))
-		  /* Display \200 or \x80 instead of \17777600.  */
-		  c = CHAR_TO_BYTE8 (c);
-		const char *format_string = display_raw_bytes_as_hex
-					    ? "x%02x"
-					    : "%03o";
+		  {
+		    /* A raw byte: display using an octal or hex escape which
+		       would produce this byte in a Lisp string literal.  */
+		    c = CHAR_TO_BYTE8 (c);
+		    format_string = display_raw_bytes_as_hex ? "x%02x" : "%03o";
+		  }
+		else
+		  {
+		    /* A Unicode character not displayed in any other way:
+		       use a Unicode escape.  */
+		    format_string = c <= 0xffff ? "u%04X" : "U%08X";
+		  }
 		len = sprintf (str, format_string, c + 0u);
 
 		XSETINT (it->ctl_chars[0], escape_glyph);

--Apple-Mail=_A3D1196E-9913-4565-A797-EED953C94F96
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii



I see that the redisplay-testsuite.el needs amending too; it actually =
looks buggy in this respect. If the above approach is deemed acceptable, =
I'll submit a patch that includes that file as well.


--Apple-Mail=_A3D1196E-9913-4565-A797-EED953C94F96--




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 1 Oct 2022 11:51:12 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Oct 01 07:51:12 2022
Received: from localhost ([127.0.0.1]:43822 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oeb1T-0000CO-Tr
	for submit <at> debbugs.gnu.org; Sat, 01 Oct 2022 07:51:12 -0400
Received: from mail-lj1-f173.google.com ([209.85.208.173]:43612)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1oeb1S-0000CB-Lq
 for 58168 <at> debbugs.gnu.org; Sat, 01 Oct 2022 07:51:11 -0400
Received: by mail-lj1-f173.google.com with SMTP id b6so7293658ljr.10
 for <58168 <at> debbugs.gnu.org>; Sat, 01 Oct 2022 04:51:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject
 :date; bh=F3GPe8QV/ViC4zUgJ4xE1hTcG0F1xAE06U3x93OL1X0=;
 b=Px/hGc+VvFYFYiq/cZJBjbLcnjrNVcxKIxzGfPrGHE22VnTuHuJj5irSFTOY/c+1A8
 FaHpe2g12RLdJiAzwNJxPxN1wf8cwrd6vJnBGmBOBh996VgOMYJNnot/N6u6P8j4h9AF
 ktLHYOFCqTDAKlzoOmTLfrizKmUyhxkKuGppy2TrRE72dXe9wAuCCe6F6U4LI+wusdWi
 eAMUlAfMAxvEBdm7HvnMtEuVlVfa6U3IuQGHNXeHatuUnqpb/YQ/SabJy9NQDg/c8ZRF
 GfLg8NCDVU15lJrMkaP3M9vOpvr0myjqUzGvhfH2kgr0NDACnlWcCndLbrntgckYKdBc
 3dxQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:x-gm-message-state
 :from:to:cc:subject:date;
 bh=F3GPe8QV/ViC4zUgJ4xE1hTcG0F1xAE06U3x93OL1X0=;
 b=hg3QMDN9r9UnSdwCk08T5QItzwSD+gUxcTS7fNZmhzct6tSNZqyTINgAoZx0t3rIFY
 OBXnFp8vFwPMx1HOmaBX1+P/1lT7Y17/aILXm7XkPV6wV5b6NyOMObpOOElGsbPMhnMg
 Rvcz/q2ZvEbifAtbRqSzdePdzexIHxVcK4KThpGcDIM5s5mgdd5oTENCC8mrO7/hiowr
 +cH3ciymRk7TLHTNZVyvyfotkzv7tRu7Wa3JD36biwt9FKLbJRBk5ZdFxD2L1sDezhNC
 F1zTvb+71PhL97ICfODl0D+NMidRD7dqRLlSGsghcqZFX2gnx5L9AhAknULAYmijk+8b
 2Sig==
X-Gm-Message-State: ACrzQf2vA6KTjU3pfka5nvgVlJfvY3I3J7tNaHdrWzR5FaHFqp79Z/CK
 KQUIaF7+9ntx9MtL//GSfR0=
X-Google-Smtp-Source: AMsMyM695+nbBpNkojeO19MuaG4v+yk8i21LCz0x8nFglnPUI0gipMzs5zmXcELeyFYxAdfpJGPU5g==
X-Received: by 2002:a05:651c:1542:b0:26d:bf29:8cd5 with SMTP id
 y2-20020a05651c154200b0026dbf298cd5mr2469181ljp.304.1664625064426; 
 Sat, 01 Oct 2022 04:51:04 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 bf12-20020a2eaa0c000000b0026c603169aesm423513ljb.0.2022.10.01.04.51.03
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Sat, 01 Oct 2022 04:51:03 -0700 (PDT)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <83edvscdk8.fsf@HIDDEN>
Date: Sat, 1 Oct 2022 13:51:02 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <F5783C06-97E2-43DC-B573-6F2D9F06EE8B@HIDDEN>
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN> <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
 <83edvscdk8.fsf@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, larsi@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

1 okt. 2022 kl. 07.34 skrev Eli Zaretskii <eliz@HIDDEN>:

>> (What about printing it as "\u0080" instead?
>=20
> NO!!  \u0080 is something entirely different.

Actually not -- (string 128) returns the multibyte string consisting of =
the single char U+0080, which is exactly what you get by typing =
"\u0080".
It confuses me, too. That's a C1 control char and Emacs doesn't escape =
it when printing, but it's displayed as `\200`.

(I don't think changing that display to \u0080 would break any =
compatibility; we should consider doing that for all C1 controls, =
U+0080..U+009F.)

Even more confusing is that "\x0080" means the same as "\u0080" but not =
the same as "\x80", which is a unibyte string of a single raw byte.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 1 Oct 2022 10:12:40 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Oct 01 06:12:40 2022
Received: from localhost ([127.0.0.1]:43715 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oeZU8-0001WW-Il
	for submit <at> debbugs.gnu.org; Sat, 01 Oct 2022 06:12:40 -0400
Received: from eggs.gnu.org ([209.51.188.92]:43584)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1oeZU4-0001WG-QB
 for 58168 <at> debbugs.gnu.org; Sat, 01 Oct 2022 06:12:39 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:40190)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oeZTz-0001QN-Jg; Sat, 01 Oct 2022 06:12:31 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=QOcRBlRXVIAob5nNsQf29+WATDkbrES+jbQUOb0R3ag=; b=VDaIK6bByazQerIX38kG
 Cdd9cBzDdZ5Ps6gaqR7E6e3oNIV346J+3+zOmRYT6XScjJcPuPONb2EOdXsAlQl4sCMpsmulaiSn6
 Jz7QkrT3gSk4GOqoxScgwIXrUEJu5Jm/lfMsZlZRNQ8ef1twaY73MDs+lfNx5qhKpilXjfGZ2+by8
 wh1FRnI8S2B94jF9bEU6Y+/nOX8/creCVte5zX/K+x8jUSmJjC9IliKnyjAVG5f9g8VfJRKLQPftR
 pj+ZL7b7AikQwPEgGWn0UqRVwUy1JxaaFmZDg5NIzcl1xnadO2uLS5XkFSunO5hJDLWH83EtnvhKk
 JRlW5pr5zEjLMw==;
Received: from [87.69.77.57] (port=2733 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oeZTy-0005FJ-T6; Sat, 01 Oct 2022 06:12:31 -0400
Date: Sat, 01 Oct 2022 13:12:19 +0300
Message-Id: <83o7uvc0os.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Lars Ingebrigtsen <larsi@HIDDEN>
In-Reply-To: <878rlzj1zv.fsf@HIDDEN> (message from Lars Ingebrigtsen on Sat, 
 01 Oct 2022 12:02:12 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN>
 <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN> <878rlzj1zv.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, mattias.engdegard@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Cc: 58168 <at> debbugs.gnu.org
> From: Lars Ingebrigtsen <larsi@HIDDEN>
> Date: Sat, 01 Oct 2022 12:02:12 +0200
> 
> Mattias Engdegård <mattias.engdegard@HIDDEN> writes:
> 
> >> (string 4194176)
> >> => "\200"
> >> "\x80"
> >> => "\200"
> >> 
> >> which are kinda equal in some ways, and not in other ways.
> >
> > And (string 128)
> > => "\200"
> 
> Funnily enough, the latter displays in a different way for me, which may
> or may not be a bug:

The former is a string of 4 characters, the latter is a string of just
one.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 1 Oct 2022 10:02:24 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Oct 01 06:02:24 2022
Received: from localhost ([127.0.0.1]:43704 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oeZKB-0001IA-Rw
	for submit <at> debbugs.gnu.org; Sat, 01 Oct 2022 06:02:24 -0400
Received: from quimby.gnus.org ([95.216.78.240]:37492)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <larsi@HIDDEN>) id 1oeZK9-0001Hu-El
 for 58168 <at> debbugs.gnu.org; Sat, 01 Oct 2022 06:02:21 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org;
 s=20200322; h=Content-Type:MIME-Version:Message-ID:Date:References:
 In-Reply-To:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:
 Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender:
 Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=fPnUwFIjUdJAqjyDMYi+JIfgYuxDQTO3Xt0WeVc+0Fo=; b=GZhM2IQyk3UZYE6BONKDj8HxPg
 aGYsCR8Ru1o3bwYOLjuMbaXjxvE9X/OIQMz4tXYovr4XWG5EpR5djmdeVBZqKbwAyIuKFLdgJCPI6
 b6fctt8zapPDQUB4VdszmnE7H5EmDzFWlwMAU6yMQJ1/PIQDBrfvRDPthENzf09egB5s=;
Received: from [84.212.220.105] (helo=downe)
 by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.92) (envelope-from <larsi@HIDDEN>)
 id 1oeZK1-0002v6-Dx; Sat, 01 Oct 2022 12:02:15 +0200
From: Lars Ingebrigtsen <larsi@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
In-Reply-To: <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN> ("Mattias
 =?utf-8?Q?Engdeg=C3=A5rd=22's?= message of "Fri, 30 Sep 2022 22:12:24
 +0200")
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN>
 <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAABGdBTUEAALGPC/xhBQAAACBj
 SFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAAJ1BMVEU0MDJMTDlBPDZl
 Zjy4wTmJkmkRDw82OkM5R0tYXk87Qzk/PUH////k1sMtAAAAAWJLR0QMgbNRYwAAAAd0SU1FB+YK
 AQk2K9S0fkEAAAGoSURBVDjL1VM9T+NAEJ01ApOk8XglBFSOC+pbrfMLsvkFTgpEQ5XaVHbpLqYL
 OkWyL82F5g5XNBTo/tzNLo6zIUYUVDxppd335mtHMwDfAOhD2MU7dIb+ARs4YQCOFPhOcnAYABxF
 E2ELThgOEdEDJ1JjzxLCMCQe6cai+Md+JEpgINCDDrDY97t44LLbATyn2+FTnHy5u1vMtxe3pdzC
 CIc53PQDYUEnK+dNrP5dvTHmWVEVhHx++yb0qrc+kEBIKFQjMLxoWutmiV0V8863OU5zIzSN7OPv
 tiojwNnh30wo50rbLBJ3AYN/PcOnSWvyZ5GeFsn6uVo+a/veslXSFLJseby+qQYlwGppB3azcvYy
 W62q/PV6P2V6X67vq9eBHcmgyPtQQ35Y5N+iqLEo++9Y/ccaNz8fzCzu8KsGqB9xQ4O+74GXR0LQ
 TNOwBRYLsBlFKqaRFvYUMro+TdV0glKSYC8aQz6bKiVHAjnutonRU4yUUmMkH0q/TcOE5EKSoATX
 AvebPdF7JC6m2oXLWbyriyFDgZF2kZMxtxZTuyDXSiz9oE3+HxWyWpOoQ7eCAAAAJXRFWHRkYXRl
 OmNyZWF0ZQAyMDIyLTEwLTAxVDA5OjU0OjQzKzAwOjAwo0gXEgAAACV0RVh0ZGF0ZTptb2RpZnkA
 MjAyMi0xMC0wMVQwOTo1NDo0MyswMDowMNIVr64AAAAASUVORK5CYII=
X-Now-Playing: Submarine's _Still in a Dream (4)_: "Chemical Tester"
Date: Sat, 01 Oct 2022 12:02:12 +0200
Message-ID: <878rlzj1zv.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 @@CONTACT_ADDRESS@@ for details.
 
 Content preview:  Mattias Engdegård <mattias.engdegard@HIDDEN> writes: >>
    (string 4194176) >> => "\200" >> "\x80" >> => "\200" >> >> which are kinda
    equal in some ways, and not in other ways. > > And (string 128) > => "\200"
    
 
 Content analysis details:   (-2.9 points, 5.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Mattias Engdeg=C3=A5rd <mattias.engdegard@HIDDEN> writes:

>> (string 4194176)
>> =3D> "\200"
>> "\x80"
>> =3D> "\200"
>>=20
>> which are kinda equal in some ways, and not in other ways.
>
> And (string 128)
> =3D> "\200"

Funnily enough, the latter displays in a different way for me, which may
or may not be a bug:


--=-=-=
Content-Type: image/png
Content-Disposition: inline
Content-Transfer-Encoding: base64

iVBORw0KGgoAAAANSUhEUgAAAYIAAAC+CAIAAAC3V5+8AAAABGdBTUEAALGPC/xhBQAAACBjSFJN
AAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAABmJLR0QA/wD/AP+gvaeTAAAA
EGNhTnYAAA8AAAAJYAAAB1AAAACOisJoYAAAQ01JREFUeNrtnXdYFEcbwN/Za/TeQUAFBAtFUbFg
RcUSYmLvMfbYYkli1E+JUWPX2GPsvTfsXbGhiIjSFEF67+24u935/rg9Fbg7DgUO4/yePHl82L3Z
3dmZd9827yBnZ2cAQAi9///H/6jwbwKBQKhxuPn5+XKlDxABRCAQ6gSKdAGBQFAvXIZhEEJpaWmk
LwgEglog2hCBQFAzRAwRCAQ1Q8QQgUBQM0QMEQgENUPEEIFAUDNEDBEIBDVDxBCBQFAz3Gqcq9lm
6aO78900QPh0YZsOy1+Kv95u43fbGnd9ihUFIA5Z6N5mWQRNhtJnDUM9G2dX9xZNbM2NDHT4TEle
RlLMq+CgkNg8SfUaEpg4e7b1aNrYxkRPg5IU52Ukxbx8+vh5XL5q7SD9XttfXJxox2Fyr/7Uqu8/
ceS91g3m5uYWFhYqnMhr9tvDIprBWBK7tYduHd8lr/mQhf7+/v7+/gtHemjVCzGUTGOMMRY9W9CU
8/UOHyQwdmjbe8QM/02HLj+JyRYyDMYYYyx5s7odr8pfcwxc+k5bdehOdI6I/d3HMKLMsLOrxnga
q9K/HJPWP667GJkjrtwQI0x7dnzJoKa6qqxNohqMv5RLY4zp9BPDLMhqpvolhijrseeyJQzDMAXX
JtnWuSmnPfRUqXRMCS+PM1X/4CBiCJCp34YHrzNLaTkCRCUxxPNaElog/9flhIg45cbiLqbKhhzS
85x9MVGkvCmmOGr/KCdB1Q8maLMiQsxgjCWxW7rrflHv5ItFRaNMu/NvC/oYUgB0wsFVBxMY0nFf
PUi7YUtPRxP+pzdgZGOjTck+KZgpTY96/vJtSkaukGtg49K6TXMLLQoBIK5l98UBl/m9uy28X4Dl
tEPZjtobsLqPhVRQYWFy0LkTlx5HJedJBMa2zb39BvdxM+EhQFpNRu68UJTpNfVKDlZ2X2XBm9dd
n7Kjtx7HfuzSKZvvroqU1Nd38N9BFW2Isp9yvYBmGIYWBs1zVse3v75pQ8jQ2bu7j4+Pj0+3Nvba
X+XAoexn3iuT6hllObFPrxze5P/zgmNvJSprQ/w+u7NozJQkPT6ybHyPpiblFRWOkfvoLU9z6Pc6
UfQ6b7nWuH6/3axiihlJwulJzXUq3Kdx+4W3s2VniF4t86zaWNTu+U+iBGOM6ZS93xgQy6xeiCFB
u1WRIoZhGLrwykRrtcTW6psYIgAy6T5jyZyxfh1czDTZF8JpvjBEpLoY8ll2fvv0Lg0U20lIv8v6
8DLW1mLyTo8wqfzm9QYcyWXPoFP2+cmVGZTdlOuFMqdVzJr2VcshXuu/IsRSU+729IYknFzbqGCU
6feaPMqBCwC44Oq+kynVN8gEpi6t23q4NLIy0dNAopKivIz4mKjwF2Ex2SJ1Pz6lY+fZqWPLJjZG
mrgkOyk65H5gcEIRrrPeN3D06tLB3cFSn08XpL4OuXfr0dsCFTsYadm06tK5tXMDI01cnBH38vGd
+y8zyuqu63DWzY2Lbn5GA6IbC/xuKL9E/t3lKy7/uO9bPQSA9Dr3bKdxKKC0fA86eLjqSEUPk331
+PU8ee+OSTh3Omi9T3cBAHBsXJsboYfpyt+xOGT/wRdzlrXiIa2Ok8d7bF/w7CsOC9cFVWlDyHTE
mRwJzTAMnXtymFG1FBFk4DFmzcXIXLG8MAhdnBJ66Z/f+zWW823idVwfJ8HVQRSysLkya5HvuyuT
LnemltOAvy68ruAjZeiCN5dXDXTSqPjzrluSaPkXVs1FjQzHBAilvyg9N1IfQNtl6Jrr70oqXD0/
/NisdlV2Mteq22/HwioEhRhh8r2/hzprUVaTb0ivxJQcGyCAOqR62pBKUI1mPyhjn7Ds7gy7iooJ
r/PGeHakiF/4uyr6qgr67s1he6sscJYq6g2n6fxg6bNI3m3sUqfd+BVS1QtBZn2HdtNDAIBL7p6/
lqu6osC1H7I36OGeOX2cDbhy5hWitCzdek9cPMVbt+5tLMqw8583Hx+f19dRlypfcJLSdfCdOaO3
Ra3q4Uiv/f+uPjg0x8dOs8LV9ZoOXnvlwvxWmop/rNF00smgq38NbmFYvluRwMp7+sE7JyY14cN/
B1yQl/9+zFGcSq+FycnKlamPGpoKu42jqcn2ChZnpueooHDS0RcCIiQAAByb74Z4axBRUZtUYZQh
Ix8/b20EACB+fuee6lKI6zLzwM5RThpI+uZfXDx+7m5oTGqeEGkZmNo08fDq5NPN00ZbwWxn0h8f
273L5MNRbqPuI7vacwCATgo8evW1sNKoSQjNUfXmONbD/90338uQAsDCjMjg4PB3mSVI26SBc0vP
phZa8u6JTn14ZJfA8P0hyrL94N4u2p8kQCmjPmsPD2xviEoSHgacu/kiPp/Rt/fsPdCvlQUfASA9
rwWbpp70XhMtJ3UOGfZYc2ajn41UAjGFb2+fPH71WWwuNmzctt+wgd525r1Xb7PP/O9kEFBmFmZs
r2NhUnxGRQFCv3kUlMm4WVEAHLu2bayoIHlhXL67V0sNBACAy57deVSoypXpyDv3Umg3Ww5wLHt/
04Z/457aXQj/YZQbZVp9d6eLaZqmGVHE8taqp1xrdN8qDTVgpjBombeRnKlN6Tn2nL79+o7hqhh6
n+2i/mCUiaPv3E2UYKYs4erSgc0Ny81YpOvYe86B4Asz7KvQhqqdN/SRUcYwDMOURuwa6qhVTjh5
L31cwBoO4uiVbeWZNFod10axphgjjNozzFGz3M27Tw9I+ciW/Q8YZZTNlBus4coUX5lgKefN81v+
ESpkpNG0mH/7yBkcPIfJl7Okb0uSuMdPZceCzsCjedJri8P+cOcCQU1iiOu57FUZTdM0TRccH6J6
Khe35dJXYmn0In1P3xoIaNegGJIO2NgDg20VDSy+jk5Vds3niCGMmcLAOS6VJ6mgzV/h4vcep2aV
WkXmY86z84IpebzAVSDnKn13xUv+O2JI0OrPF2xWIp15bIiJ/Bev5fl7IBvZZwrDj/7W38OSVWm5
+g07jlp5PVEabWMKgld1rYZ3k9Nk3hMRK/OvT7Ik8dnag1I+d1zd7aWzgX4bHlmqeqsWVqx3hUlP
SCqrbw9Nx++aPO1EgqKsNFFRUa3q30zaseXbIyuHXspCTp9/K70prpNr04ruCGThN7y7ntQayzi5
YnNY5X7FuVdWbwsR11mkr3bhN5ux4ecWPKlj8unaP09nyX+wkuAV/XpMPfAyn8ZIp+mQFWdCkvOL
87MycwqLc94G7v/Vx4YPovSgfyd27v7b7RzVO4eOfxVRyAAAIF5zj+a8/0SnfoFiiNvIqRFP+hGg
k+OTqrHMT1jKem84DZo11atf3xEsCt627ma+2uYqzr8VcLdY3hFJ9KsoCQYAQHxLq4rrF3Q6dG8r
dWkzuTdO38yXP3XeXAgI/09k/ep4Ldrv30EHAQAuerJ8yoZXYiVd+mz7aI/G7Sb+E5QpwQCIq6Fn
bGKow6cQABbGX/1rSMeuE3c+r+ZLlyQlJEs9TZSRo6MpSR9SjxjiWNrIlJqClNRqpNOIo15GiqTz
Sd9v+Y6pXmb16FMiCb9w8a0aV05L3ryMVKAgluXllkj/hbR1KzjAuY5uzdk/SSKePi9W0Dr95vmL
wi9eHaKsB2478puHFgIAJv3i7FGrnguVna/R6JtFh29c2jqxrWmlsCzSsOs1//Srtw+2j/fQr9YX
kU5LSmMHCmVlY0nEkFrEENI1NORL3xsuKiyuxtjGqWf3X8ll9Vn7AZseJiS+uLxn5dyxfh2bWWqr
N46DC1+GqlMKAZObnacgYozFYtaiQhwOp/yU4VjbWkt7DhfHvU1TGHOWxMcmfNnqEDLw/vPM7hH2
XASAi5+vHTp65xsl2YPIuOtfd56d/WOwuwkPSmIurprYu1VDE10Nvqa+pXPHQb/sfJQuwUhg2X7S
jsC7G3qZqS6JcFGh7NtLGRgZEjGkHjEk0BCw7wwLhcLqfGJx2uEZPx2JE2FZS+auvj/8unr3ucBX
yXl5SaE3Dq2Z3tdZXx3WGp2RnKbWaYrLhGWfoK4I9A002HThwvxCRpW580Wi5T7rxOl5rXURABa9
PTj229/vKMsT4TSacPDEr20NKAAm7/4fPq2/+e3fKyHvsovKxMKCtOgHJ9dM6NxmyO5oEQZA2m7T
Du36QfUCEWKhUIJlc4FPfNRqEUOYphmZHOFyudV7C3TisTFtu0z7515iaflBhLg61m7dh8/ZGPAi
/NqiLiZ1/XKxsFT4BU7Tj9UjiViZHFV+tH7Ddxp3MGBldxMKAEuSz03tO+FEolLNVafH74t6GlMA
gIVBy8cvfSRnNYc44fTPP/0rrWBGGfeeP7OdqgmeFJfLqTQXCHUshoqK3n9XNbWqn0dKZz7aOrlz
QyunToOnLlq/99z9yEzhR+8S8a19/M+dmt2sjrN+8Rc5nHCZUNZ5SEtbU7HwRppaml/md5trN2TX
pa39bbgIgMm88Wu/ETujq4iyanYa5Cf12WDRo0NH3igSWUX39p+IkUpnTsNvvvVQMQlIQ0uTzbLH
xV+2jvkFiyEoy0hnvRiUoanJJ3qZ6byYwBNb/5w9tr93U3ND61Z+M/6+Glsi07L0Os5b6GdE1N2q
kWSms8FmpGdupvijQJmYm3yBXgyOld+Wy3tHNOYjACY3cLHfwA2hJVX+yN7NlXXZMJkREemKTVVJ
5Es2CAkc22ZNVXMHcIxNZV3JZKZnkSpb6hFDdHwcqxIjnqW12ecPbixMex6w6Wdf13YzLqXJQqE+
fduRlYNVI3kbzSYVIb5TUwdF33Ok6+hk9aWJIcqs19pLhye4aCAApuDJiv79lz0uUEH5oAxN3juO
haVK89pEpaUyVQnp6auWQ0JZNWDDY1iUEJdE6lLX3gBQdpBODI9k605wGzs1qrl09uKwbbM3BEuj
H5RuA9sq1SGGZmS2FI/3dabVMylPguKlM4Hj0K2bouUmmu27tBV8UdolMu7618UTM9y0EQAuer5h
QL+F91RMMsSiMlkUhDI2N1USgkVGZu9D+VhYWqpK+8jAwdFc2s107Kuo0nrch/9pMQSi0Kdh0ggy
ZdK8RU1+ZJmsjPdKLmaqVHfpwnzWNOeYWph+nSXoxSEBlxJoAADE8xwztqU8FRKZfDOqn/EXpAwh
g47+AafneuoiAFzycuuQPr/cyFTZCUMnxibIQll67Tq3VOxl1G3n7c56FZjc2FiVxBzXxa2ZNHuX
yQ95Gk1KwapLDDFpgfeipN3Pa9GmpcqLw7gmlqbKXUm67Tp5SM9gcmLeVjkq6Lg37F4tHKcunW2+
zgwO0aOdu0LLMAAgXrNpG2a3rFgUFRn5+P/x3ZcjhZBem9/PnlvYzoACwMLInSN6z7yUVh0PDM66
f+elzOXTaPQvQxWMDJ7LxFnfsuurcf69m09VKWJG2bVpbcEBAMDCoLtBQiIs1CWGgI66cjVWAgCA
dDt2aaViTAvp9fs3+s2tbXO+97SU50vVdBi+ad1waYI2k3bx/OMql3DRsfcCpRYJ0vCev21OO9Ov
0TKThG+ZvztOggEA6bVfcuny6mHuMnHPM3EfufHKsSlNOMWFRXXkTNUwsrK2+RgzPZ7MHuTqmpU7
Zm1pWGEoaHvMPn3xz85GFACmUy//Pn55MLK0UYK1lVHF0US/Prj9Wj7rZTT9dvP5TYMcNCsORtfx
e8/+2UGagY4lb/dvuZCngjKEjDp2ceMBAGDRk8s3s0mcrBapajZLQs+cfzv7lyY84Fj69HTj3nmq
4r5zWnZdJ6/pOmlFbszju/cePXnxOjkjrxi0TO2bt+8z6Htve+moYHKuL1lxrajq9sRPdm5/PHl1
B20ElIXvqgcJv0SHhsVmFokZmZw6NGvuiVrZMkSr/cy/J3qUV+4oCy+26jHH7vtlux3LL1Zi8m+v
nb0vvOZdmjjv+rwxq9tcntdKByGueae5h0NmbEuJT8xh9BvY2xgIEDBZl1fv1pz/a13UCxT03hx9
apiOXEcUx37S2bhJH9967n4/yzEXPgTguQ79x3RmA1GIY9ln/YO49VU8ffHRgcbDTpeL4TNJ+2f9
PqjtFl8TCgDpevx07FX/Xy6eu/YkKjmf1jBq0LR9n/69mpuwbiFcFrXtpyWBJao8nV5X3/aaCABw
2ePTAUkkTFarVFkSn+u6KKSMZhiGEYUvbaWSEoKMfrgorLJuKyPJur+0q+rpizyHkQeiS5iaKgKr
Mgajz5dWqyAtplP/6VFBcaxUBFb+XfbckcZWRQpdrKiiKWXWzf9mipxduRg6J2ilj7n1+yKwRUe+
r01pJPjucCGjap8wOfv6lbsZrtsfYeJqdaui50F6rWcFxJdVdSt03rPN39mpqkXrDzgi3c6DKbwy
4YsLPX5pVN2/klf7dz0owQDAdRw42FMVuwwXXF8+ecGWUw/eZJfJSxbETNG7u//O7unRbeHtLJWV
XXHMwVEtW/hOXbn/ypPopJxiEf1lJiJ+LkzGLf8ezTyHLPj3UnBsZrGYlgjzUyLvn1o/vYd7p99u
ZHL5Msvo01aNfHHggqfrv3VvO+qvU8/T5T0wpgtjb+2Y3cu1/bQz8Soq88Z9hvUyoACAyQr492Qq
0YVqF2Rubo4QSktLU3KO4bf7wk+NtKCASdnZz3nilSLVm9cwc3JzdXFsZGtuqKvFx2WF2SlxUS+e
PHmVWkqM7VqB47r42TN/Ny6AJHplh+bznnxNER6Orq1b29YtHG0tjHT4SCIsyEqKjQwNehqZWa2i
V5TtlGuRW7prIZBEr/Z2+/VxvSuZ9V9Dpc2jea6LnglphmHonLOjycbe9RqdgUfzpcUGSwLGGJN3
9Qlw3RY/FzEYYybv8oQGxCCrA1Tbwx4Z9d+XLGEYhhaF+ruTOnT1F52+u1OlPo2yR3Mdv84Mq88D
GXy7P5XGGDOi8JVeZEuOOkE1MQTAaTztRj7NMJhOOzLIhHxk1QXf2qmh4i2VBG7zHxUzZLvRz4Dj
/MvDEgZjTCfv/44M9DpCVTEEIPBYHFxCsx8JTdJzagGZjrtcnBtxYfOvwzs5GHwc9qH0mnyz+FKC
iN2k4s2mbrpfbS99Tv8OPppOY4zp7As/2hIxXkdUIwuw7PnqceNE/ZsIEC4ys+ZCDMluVw8cA5e+
U1f2nbqCLkmPiYxJyiykNUzsmrZwMtVA7NqDoGWj598q/No7qvogHTuNV1v/jEJMXtD+fQkkQFZX
qK4NEerDPDEZd1lpShZdEHFkemsDYkwQvqRhrULAnlCf4OrZe3Ts3KVLF28vd+dGtpbGupo8EBXl
pMSGPwu8eurAvjPB6WLSTYQvCqINEQgE9UKccAQCQc0QMUQgENQMEUMEAkHNEDFEIBDUDBFDBAJB
zRAxRCAQ1AwRQwQCQc0QMUQgENTMV1VZXgDOnuDRFGxMQIOC4jxIioGnjyEuv3rN6NqBd3twagDa
HMhPh4in8PAlkI0bCIRPRn4WNdIeP2uueNc8ye7fQr41k8kqTt/xs5i98/He3zPGNpJfDVbgBZFi
wBgwDceHQD1Z2sQxgR/XQWQOMBhw+f8YITw7DoOaqnSrBu6w/joU0eUbYSDrBSzsA6qUpkHaE+f+
hvfOx3vnhfavTscSCP9datooK3sCG24BBgAK/OaASz2ou6XnCeefw85Z4GwoR9YgAbQcBMeewr5R
oLx+vN0gePgAfvYB7QqdhsDYFZYEwNVFYEiWlBII1abGjTIGDq2B+d3BlgOCljCrB0y4os7no2xh
bwD0kal7wmQ4dwIeR0GeBIxtwdsP+rgBDwHSgpE7oSgTpl4BuTWyddrD2X3gIi2zhCE1CI5fh5Qy
aNQWBvcBQw4gCjothoOp8O2/QCqgEAjVoRZ8Q0W3YPNTWOkFiAPD5sKya/BOfXVb+vjDtxas7Eg8
C31Gw6uPCvqv/QPaz4eAP8CIAsSH8Wtg900Irrw+XQDztoGbJgAAlsCFWTB8M7xvZpEPnD8JbfUB
KPBdCT9chZ0JZGARCKpTG5EyGnatgUwGAEC7M0zzUt/T6cGI79hHZNJg2o/lZBAAAAMPl8PCW6wG
xHOGoa3lNGM2CH5qDggAMMT9C6O2wMfNZNyAIb9ALgMAQBnCbzOB1C8mEKpD7QTscwJgRwRgAODC
2LlgriaPCdcBXHXYf2dfhet58k5i4NxpYHev5oBr80r+IwoGjgUDCgAAl8Kq5ZBfyWyL3wc7XrP/
bjQcummTgUUgqE4t5Q2JYNt6KMAAAIZ9YVKLav5cE/73mI1qid9CXxPFZ/Jh7l32TDodfmxY7iDS
hfdbG6fGg6JiYNkJUCKTLJraFcUQMgW/duwfi2/BmWT5z3vkOOsSoszArx0ZWASC6tRa+mLKUdgf
BwCA+DBpFuhXSyEqhRWj4UYOAAC3EezYCtYK7rP1AvjDGxAAlsCBSbAnrtxRJoe1lQBAQ3ERf44m
sEFyDJnpUMGRpdkW2sisrFd3IVvBLo+RdyGdYbu0Q6evKx+LQPg8ai+LugT+3sRqGZaD4YdG1fu1
+DX8OA1SaAAAqwGwc6Kcia3XGfb+BloIAEPEJph5rmKQi34DQZnsv+3agqKdyN29QAMBAOAyuPOo
4tFGbjKVioHnIUAruuEX8EL04SdkVwwCQWVqczFH7F44kQYAgLRg+nSo7p5CSUdh3E4QYwAKeq2C
2a7ljiITWLsbXAQAAEVBMHqBHJcNCOGfnSDd1pzfAf7nKydviOcA//sBpOlNyUfh4LuKJ9g7yCQg
DXGJCu8WF0J8Dvtvvj3YEnWIQFAZBbWokb6RcRNLY2dLY3vdDymI2gZG0j866PNUsrJcF0EZAxgD
kws/WFb/7nRg+XPW9VMSAl7vXb8UDDkCEqlLKBt+clLcghb8Hgi0NGG6EI7+Bh6WrOzl6kPHUXA9
kW2/IBi6GlXuB5h0TZZvXQLDlSg5HJgfzJ5JZ4CvolxopG9k7Pz5HUsgfAViqIZA5nA6m13xELYE
PmHXaUELuJvHthC1CaQ73zQcD5nSFRUSODa0CpUO6cPk/ZAn+bD2ojQfMnOgTLYmgymDxzvAQ1++
cJn3RHZaDvSV5Vk3GQ6B7yA/DU4vBHPp5SmYeuuDwBpKgmUEQj0RQwDQfg2IpTpCGnxn+CktNPwR
0mnAGBgJ7P8OBC5wWyaYXm8FI9W0B+PWsP0xiJmKa8pK38Gy/ooNRi4seSnTcVLBhw8AwGkKT4Sy
Fmg4NQIQACAYd1n2RyGMNSZDi0CoN2KIsoMbhazUeDAHPmWRGYLv9oOEYe2d22EfmWk6KvxcA75Z
BM8z5axrfa8fpTyA8R7yVrdyYXkEe5okHjrzAAC0hkDJR+IsbgOr5Y06/0EMTTAnQ4tAqDdiCAB8
d7B+HEkcdP8kawUZwIbwcnKEzoPZzVT4oTH89VjmGyqGCyvBtyUY6wBPAyyawMC58DCNbZYpgr97
VZJEcrUhFwj6SBs6NphoQwRCvRdDH+YtA5fGf2J0TrMVBBV9mPxnf1BBseLAxMusDKJzYXE7eZEy
W9gZJUuAzIKxthVbkOsbchwEF0MhLgL2/gwm0kaJb4hAqM9iCBAMPsaKA3E4tP6kijo8Z7iRI5vn
Etjbv+oKQTq+kESz4u/hLwrFlk43eCtzYL9ZC/zyd17DkTICgVCJOto8mt8awmXl0I4Orn45NC3w
f1reKKu0dKMyfXbJzDEhTLVRfB4XVkTJHECvoW35lJ9v9smuK4JfHJU8IWxNZhsRh4EbyRsiEFSl
rmpRi57B+utsObT+c8C5Wp5qBD1Wwe+tAAHgUniXBhiAMoP1+6CFkkplHHBzlS2vz4SIdMVnSuBl
FJuBzbGFpuUj93ExsvpBHGjYQPE96oKdLO1IHAcJNBlbBIKK1FlJfAaOrIF4GgBA0Apm+VTjp1aD
YPdk4CMABu4ugDbDIUoEAKDXEQ4uUbxsggKT9/kBQihVeonS94cR6OuVOxT7Aoow26B7S4UdxnMD
N5kh9vYFFGIytggEVakjowwAgAOz77MGTuF1sFNNAvKc4EoWa+xkBEBDDgBAmyVQLE3OFsPOfgpM
vI9i7UwOfKOkCBCCyTc+BLkmle8NZAFXS9ijhQFgpsCedPNn06MwDf/0IOOKQKgGdSiGAAz7Q6rU
ZyyG1apUw9CEhbKKH5IkGGwl+zsf5t6ThbdSYbSdfOEy/orMrSOBNe0VX0UPDmV9iMr3qOBdpmDq
rQ9B/UnW8lrgwV+RshbSoC8JkxEI1aFOxRDwwD+UndLZpxRqFu/lSLcNsiVpYvindzmth9sYLstk
R+5taCYvMuU8D0SyPMOMM2CjQP9ymQtFstNyjrHrRcr10SjIZdiI29ttcoqW2E2AHNnSkJj1pPoi
gVCfxRCAxWh2SjNlsLC50jO/h3gJO/lfrQG9SpPfehikyOLxIcugckI1ZQMXcj+kSodsAYcKqzYQ
uI6H6FKZ7SaGDZ3k3YoG/PVKphCJIWB6uWuZ94CgPFkLuTDRjgwqAqF+iyHQhPVv2EmbvBv0FZzF
bQwXM9nTip5AK7mLvhB8s4tdJsaIYHtvOU4ix8myRbAYMAZhMpzaCr/NgMlTYf4KuBD20SozBiI2
KlyhpusNYaUfzkwNgo1L4fdFsOMC5Eo+JFVemUIKnhEI9V8MATSczhpBTDHMkFsOTQN+fyBz/eTB
rKYKm0L6sCFC5jxKhuEN5Iiq1rMgvkzBajL8QYI82wx2SkVIw6EQVaKwBYaG+0tUXWdLIBDULIZA
D3YlybKW11da3Y6g0xooZVjpcGF8FYs2tNrAM5l0yLkBzvKKiRi6w/KTkCaUJz4k8PYmzPKpYq9E
KQYesPEmG6T7eGVs9ktY1I+4hAiETwOZm5sjhNLS0ur0ss0XwLM/gY8A58EPLrC/Tq7O0QW3ttDC
ESyMgI9AWABJsRAaBJGZ1WtHryF4twcnG9DmQEEGRDyF+2EgJIlCBMKnog5tCACZwoksVpUI9f+U
cmgEAuE/g3rEEAB4rZSVQ0uF/ob1rl8IBEKdoTYxRDWAqwWsQhQ465PKoREIhP8C6gsvM4nQS4+8
AAKBQJEuIBAI6oWIIQKBoGaIGCIQCGqGiCECgaBmiBgiEAhqhoghAoGgZogYIhAIaoaIIQKBoGaI
GCIQCGqGiCECgaBmiBgiEAhqhoghAoGgZqqztFWzzdJHd+e7aYDw6cI2HZa/FH+93cbvtjXu+hQr
CkAcstC9zbIIsklrvYBn5Ni6XevmDtYmunxGWJCZ+OZl0KOQd/mSr75nBKYubbw8XBpZG+vysbAw
OyX29avnwa+Si5mqfon0e21/cXGiHYfJvfpTq77/xNXGWFe50Aev2W8Pi2gGY0ns1h66dT28mg9Z
6O/v7+/vv3Ckh1a9EEPJNMYYY9GzBU1JkZKPR61p/30JEixDGDDGsKoC3Uhg7NC294gZ/psOXX4S
ky1kGOlvJW9Wt1O1Ip52k+//OBGaKWJwBZjS5Ef7fulhK6j/j0PZTL1VhqtJ6amhSrfFE9h0nb71
+pt8SaWewUxZVviVbePcNat4IKrB+Eu5NMaYTj8xzKJWyq2rKIYo67HnsiUMwzAF1ybZ1rkppz30
VCk7Di6PM1V/3XkihhRNQbPvDyRKPhrqyuctMvXb8OB1ZinNyJthqoohrt2A7WEFjJK5ykgyA5d0
MUb1+3E+RQwxeYf6KxQjPPvvNj7JoZX1jGoTStBmRYSYwRhLYrd0rwUdREWjTLvzbwv6GFIAdMLB
VQcTGCAQ5ExD8+/XbRxmo7pYRtoNW3o6mvA/56K6HZeeOzCxhSYCAMB0buT10wH3XiXmYx3rpt5+
A3o1M+YixDHpuOD0sawOvTdGiuv341QPJvvS6TtCBUJt0O5bh0Y05LFCBotzY58/DYtJy6f5usZW
Ds1cXWwN+KpJ5rLgzeuuT9nRW49jP3bplM13V0XWsJmrijZE2U+5XkAzDEMLg+Y5q+PbX9+0IWTo
7N3dx8fHx6dbG3uyVTTbKRaDDidJMGaKwkKixCqpD5T9zHtlrIGQE/v0yuFN/j8vOPZWoro2xG+1
5EUZ+7mnM28v6mRabnxyTNrPu5rGGiRM/vUpDal6/DiUtom1TdU0m3w+V/pEksTtPeT7KLhNZt7K
Y2S6YPr9v39oa1HBLuUaNuk+bsWxdQNV0hK1e/4j1QvplL3fGNT0FFRBDAnarYoUMQzD0IVXJlqr
JbZW38QQofIUshx8NJnGmCl5urjzj2dLVbNiTLrPWDJnrF8HFzNN9ixO84UhIpXFkG6/PWk0K4RS
jgwyl3cp3U7rIlifkeTNuo4a9fhxVBKPlj9eZC1QcfQqL7mqF2U15ly2tF8Ycfzx0Q41oKDxWv8V
IcYYY6b49vSGNSsGVDDK9HtNHuXABQBccHXfyZTqG2QCU5fWbT1cGlmZ6GkgUUlRXkZ8TFT4i7CY
bJHa546OnWenji2b2Bhp4pLspOiQ+4HBCUV1ttcP18DRq0sHdwdLfT5dkPo65N6tR28LVOxgpGXT
qkvn1s4NjDRxcUbcy8d37r/MKFNbR1oN3rBhkBWFRS//nr76eePDqv0MZ93cuOjmp19W0M7P11Q6
IySv9607my7v1RUGrlhx+ce9fvoIOI0Gj/RecP96af18HFXgOAwf110HAQAWvzyw/6m8OcR1mzy3
jxEFAFjyZusP4/fH1MBME4fsP/hizrJWPKTVcfJ4j+0LntVgqLwqbQiZjjiTI6EZhqFzTw6r3q6k
yMBjzJqLkbliOT4yhi5OCb30z+/9Gsv5QPA6ro+TVMtTJwpZ2FyZtcj33ZVJlztTy2nAXxdeF5T3
3zF0wZvLqwY6Vfxi8rtuSaLlX1g1FzUyHBMgZEMb50bqA2i7DF1z/V1Jhavnhx+b1a7KTuZadfvt
WFhO+W5lhMn3/h7qrEVZTb4hvRJTcmyAAOoCynr4iVQaY0YUua6TDoDuCBXVB3nTrBrqA8d53hOR
TBfa7sNXPIh/vMR2teTt2va8evo4qsBrtSxczL7eOzPs5SolvPZrY6TTh8k9O7rmQlucpvODpc8i
ebexS00Orap0K2TWd2g3PQQAuOTu+Wu5qisKXPshe4Me7pnTx9mAK6cjEKVl6dZ74uIp3rp1b2NR
hp3/vPn4+Ly+jroUKn9Xug6+M2f0tqhV0xPptf/f1QeH5vjYaVa4ul7TwWuvXJjfSkkIVaPppJNB
V/8a3MKwfLcigZX39IN3Tkxqwq/r3rQZvnHd9xYUlsTtnLEksKjursyxsrVmvwGS2OhYhW5TnBcZ
warxHBsvrwZU/XwcFdDqPH5UEy4AAC68ufdYvDzVmduidy87DgAAk31h75m0GlPu6egLARESaS9+
N8S7BncprsIoQ0Y+ft7aCABA/PzOPdWlENdl5oGdo5w0pMpj5ouLx8/dDY1JzRMiLQNTmyYeXp18
unnaaCsYD0z642O7d5l8OMpt1H1kV3sOANBJgUevvq4UG6ATQnNUvTmO9fB/9833MqQAsDAjMjg4
/F1mCdI2aeDc0rOphZa8e6JTHx7ZJTB8f4iybD+4t4v2JwlQyqjP2sMD2xuikoSHAeduvojPZ/Tt
PXsP9GtlwUcASM9rwaapJ73XRMtJE0OGPdac2ehnI5VATOHb2yePX30Wm4sNG7ftN2ygt51579Xb
7DPrMopA2YzYtLa/OQV0woGf/3cjry53r0Va2lqyOFBRoRJjminIK2AAOADAbdLCmQexZfXwcap+
XsO+4wdZyyTMnrMZ8u4OmbVp58ABAMClD67erUkxSkfeuZdCu9lygGPZ+5s2/Bv3asqtotwo0+q7
O11M0zTNiCKWt1Y95Vqj+1Y23YIpDFrmbSRnalN6jj2nb7++Y7gqht5nu6g/GGXi6Dt3EyWYKUu4
unRgc8NyMxbpOvaecyD4ggJl96Pmqps39JFRxjAMw5RG7BrqqFVOOHkvfSzLfRFHr2wrT33X6rg2
ijXFGGHUnmGOmuVu3n16QMpHtmxdGGVUgzFnM2iMMZ18+IODuK6sGL7vzgzWVi67NdVG8Uvjevz5
ShbsKrv5k8I4i3ofp8rOtpl0tYi1LRO2+ShI4+V13ZzIDvXQxa5cABDYdJ646uidV4m5JSKxqDgn
KfLRhd1LJ3RvWP1EYJ2BR9n4mzjsD/ea215MqRjiei57VUbTNE3TBceHqJ62xG25lH3rdPqevjUQ
0K5BMSSNHsQeGGyrqBP5OjpV2TWfI4YwZgoD57hUHpCCNn+xZj8WPVvQrFKryHzMeXYMMCWPF7gK
5Fyl7654Sd2JIcpu7PlMaXrt6VEfze26mrdcz+URbI+Jw5e2VDwrNP32Z8s8aeKXf3pw6+XjVNWU
87wnbG6COEr+dwoAkNEPF2ReyIDRhvyGA7eG5Mn1azKl7y7/r5tFtVRnThOZN44RXp9kWVP+FEr5
3HF1t5feJf02PLJU9VYtrFjvCpOekKS28I0i6Phdk6edSFDkShAVFdVqCI9JO7Z8u5wsurKQ0+ff
Sm+K6+TatKLpjSz8hnfXk1pjGSdXbA6r3K8498rqbSHiOjIkKPuxW1b1NaGAybm6cPah5LpPaqXf
Rr4RSZ+W09CzlYmiWcH36NxeX3YQ6Rvoo3r5OMrhe479wYMv9XKEHdgfrCBORRmbGkvnHi7NgY7r
Lh2c7KFPAQBmxMV5uYUiRjY8kIad7x8XbvztW43vOh3/KqKQAQBAvOYezXk19GxKxRC3kVMjNgeT
To5PqsaSNmEp673hNGjWVK9+5flgUfC2dTfz1Wb04/xbAXeL5R2RRL+KkmAAAMS3tDKt8G50OnRv
K3VpM7k3Tt/Mlz9M3lwICK+ThZychj9uWdnbhAJccMd/5u536pi1OP/+7WdSOYQ0Oo0YLD+bBRn3
+2lYw/cffaShIUD18nGUotNt/EhHqXNa+GDvoUhFsxFp6bA+S0y1/nXzpCYCkKQ/2DSpm5ORpo6h
kZ6Gjk2rAQtORRdLu02z2ZQ9mwarHk2TJCWwApoycnQ0raFQjtJmOJY2MqWmICW1Guk04qiXkezo
0PdbvmOql1lNic0aQBJ+4eJbNa6Il7x5GalAQSzLyy1hB5O2bgUHONfRrTn7J0nE0+fFClqn3zx/
UVj7IpbTaNzWlb7GFOCih0tnbH+jpv5k4k/uv1XITijvBZsmOVcK3wicfvjn76EfhT4Rh8utp4+j
GGTiN2GAJQUAgAuv75EfIpPC47GPR+m7NLPlMqkBP3XuPmPH7Zh8GgAAl6aEnF4+2Hvg9iip2kyZ
D1g0o5WqE5ROS0pje4eysrGsAzGEdA0N2TUnuKiwuBpjG6ee3X8ll9Xd7AdsepiQ+OLynpVzx/p1
bGaprd6VoLjwZag6pRAwudl5CgYRFotZiwpxOJzyYohjLQtO4+K4t2kKh6EkPjahttUhTuMJW1f0
NKIAlwavmrYxXH0lX5ikQ4s3hgml88ms98Z7d3bM+rZ1Q2NtPk/T0K6V38877t7f8b0Nh0kNCWHV
eSwWi+vr4yicprZDxvlKV1AwWQF7z2UomYy05KPRzeQE/DZ5Z3Slzx6TeXX+78cypFOU22ToiLYq
yqGPIpKUgZFhXYghwXvtFQuFwup8YnHa4Rk/HYlj7XZAAnNX3x9+Xb37XOCr5Ly8pNAbh9ZM7+us
rw5rjc5ITlNr+RlcJiz7BHVFoG+gIe0vpjC/kFFlnNQSXIdJ25b3MKQAl71cP31NqHp9f6VPlw6f
czmNxgCAuKZtJ6w7+yQ2q6hMVJLzLvjc+gltTbkgitkz7U/ZClBcUlTuk1q/HkcunKajx3lLDXI6
6eSeK/nKhpfwo+HFZJzZcSpV7nDAeVcPX8iUJVN19HZQUTsQC4USLJMP/BqawcrEEKZpmTsLcbnc
6l2RTjw2pm2Xaf/cSywt3wuIq2Pt1n34nI0BL8KvLepiUteiCAtLhfUqGUQ1PlaPJGJlclT50c+/
EZNBK/7sbkgBFkdtmfZXUKnau6YsYuvATsM2PkiX55vHTO6zbSN6Tb0oNmbTRpjszGymHj9OZQTt
fhzjKnXSSt4c3ntP6T3igrz3fk8sehao+ImEL569YhU/roOzg4rRd4rL5VSSD7UqhoqK3n9XNbWq
nzNJZz7aOrlzQyunToOnLlq/99z9yEzhR/eN+NY+/udOzW5Wx1m/+AsUQgC4TCjrPKSlralkeaWm
lmZtinaka26uQwEA4rnMvlcod4VLwcFvZcNF0G9vDhsqF14Zb1ZLd1b65sTP3g4uPuMWbT1+7VHY
67j4+NiokDunti0Y0baJ19STsRKzhnY60lIgooS4ZLp+P055dHuMYz3sWBR64EAVS7mYnLQMmTqE
cxOTFKvGODstU6bYaBobq5iMq6Glya48wMU1pncrFYFlGel5DGhyAChDUxMewKcYzXReTOCJmMAT
AABIw8K9x6AxU3+e0LORFgIApNdx3kK/PcNP5nyRsqEukWSm52CwRgBIz9xMA0BBUgFlYm7yVVYY
x0Vvb+3+89ZuuQcFLVo2k451OjY8qj6qPIpAZt9N+M5c6pwWPth7KKoqv6Y48V0yDc5cAMBikdLs
DbHo/YzmclXThjjGprLhxWSmZ9VQQFHpgKXj4xKlD414ltZmnz+4sTDtecCmn31d2824lCYL+/n0
bVc3CzC/cDH0NppNKkJ8p6YKVWik6+hkRTY6qADfs2cXqU3G5Dx9FPUFlaamGg4b11Oa8oILru05
UXXJQcnrl1GsOkTpGeorcfnw9PRYdyNmiotKVNIEKKsGbHgMixLikmoo1qNcDCWGR7J1J7iNnRrV
XOp2cdi22RvY/CtKt4Ftles5GJqR2VLv45FfGUzKk6B46VvnOHTrpmi5iWb7Lm0FtWkrMHEbvAVI
OXojz8lW/ZVd+MGIkv5Vw3dnhnqUXj2fMQNtpUux8m5eeiD8ch6H6zpmXHuprGAyz+85p8oVi4If
vWI/WNrOzewVyqGP5jSTpmKWMTJwcJSqZkDHvqoxtVL5d1MU+jSMTS4wad6iJj+yTFbGe4UOM1VK
eLownzVDOaYWpl9n7WdxSMClBBoAAPE8x4xtKU+FRCbfjOpnTJShcgjcZiwebiWVQmlnDlwt+HJu
XbPjuNFNWWMy6eTeayrdOh177Vo0m43v6tevkYLpQtl2696ETQEpCX2qWtIr18WtmdRZzuSHPI2u
KbVS+Yhl0gLvsRosr0WbliovDuOaWJoqz0PQbdfJQ3oGkxPztkrPEB33ht2XhOPUpbPN1znPRI92
7gotwwCAeM2mbZjdsuLKRGTk4//Hd0QKlRvgFn3XH1rQWhMBAC57tmXD9fpVuUMp+r7jh0i1OJC8
PrQ3UEXlQ/Ly6OHnIgwASNBm5v/8zOTXpJw5xUsacMdFd8/fVKmWAGXXprV0ERoWBt0NEtbYW6pi
+kdduSot44J0O3ZppWJMC+n1+zf6za1tc773tJQXYNN0GL5p3XBpYiuTdvH84yqXcNGx9wKlFgnS
8J6/bU4706/RMpOEb5m/O06CAQDptV9y6fLqYe4ycc8zcR+58cqxKU04xYVFX9KWBRpGVuXrL5vp
yYq4A1fXrNwxa0tDOeOJ3+2Pk5tn+rmbV9APOQbNvv8j4PGpyc2k9WZEEZtmb3wlqe+P82EaWQ4Y
7yf1B2PR8/37Q1SOENGRO1efTmcAADg2I3afW+Frwy8/Qd1/2rf/JwdpEg797sCGk+mqSCFk1LGL
G096Q08u38yuMZu0qtksCT1z/u3sX5rwgGPp09ONe+epaq8Radl1nbym66QVuTGP79579OTF6+SM
vGLQMrVv3r7PoO+97aXhQSbn+pIV11T4Pomf7Nz+ePLqDtoIKAvfVQ8SfokODYvNLBIzMjl1aNbc
E7WyZYhW+5l/T/Qor9xRFl5sVXCO3ffLdjuWX6HG5N9eO3tfeM2nauO86/PGrG5zeV4rHYS45p3m
Hg6ZsS0lPjGH0W9gb2MgQMBkXV69W3P+r12+FK+/oPfm6FPDdOR6szj2k87GTfr4+XP3+1mOuVDB
i4E0bbv+tGjAT2uKUqNfhEUnZhYIQcvEvllrz6Zmmohtmcm6+duw/90vqv+P8/50hxHju7NJBqX3
9x5+XY3xhDNP/TrndKf9A604QBl4/XIx/Ltbp8/cfhGfS2tbNPHqO6BfSwtWExK/3T198S3V+kWv
q297Vq98fDogqQZnW5Ul8bmui0LKaIZhGFH40lYqKSHI6IeLwqp3WJJk3V/aVfX0RZ7DyAPRJUxN
FYFVGYPR50urt3cUnfpPjwqKY6UisPLvsueOtPKlYuRAmXXzv5kikrP7HZ0TtNLH3Pp9EdiiI9+r
RxpVozKG4LvDhYzKm3Ll7OtX+YkEffdkK22CKXl9fLK7zhfyOLLh/r7aK2byzoz8hOo2Oq1mXUoS
Kd2+rTTm6I/OKmcE6g84Iq2zzxRemVCj4diq25K82r/rQQkGAK7jwMGeqthluOD68skLtpx68Ca7
DMvNbC16d/ff2T09ui28naWyYieOOTiqZQvfqSv3X3kSnZRTLKK/zETEz4XJuOXfo5nnkAX/XgqO
zSwW0xJhfkrk/VPrp/dw7/TbjUwuX2YFfNqqkS8OSdSF3edDUkvlpPTi0uSgQ/8b0LLVkO2hX5BP
CD6u9gpM5rm95zOr/yaLnq3/plXXadvvvCuu1DWYzo++tPaH9q2H745S0cODjPsM62VAAQCTFfDv
ydSatDyQubk5QigtLU3JOYbf7gs/NdKCAiZlZz/niVdUf6FIw8zJzdXFsZGtuaGuFh+XFWanxEW9
ePLkVWopSVisFTiui58983fjAkiiV3ZoPu/JV7J/O9Iwc27l6erc2MZUT4sjKc5Ni38d9uRxWGJR
vVsuX9dQOg3c23m5N7EzN9Dk0CU5KbERzx4+fpVeLQ8zZTvlWuSW7loIJNGrvd1+fVyja+9U2jya
57romZBmGIbOqclK/4RaQGfg0XxGaooEjDEm74pQI3DdFj8XMRhjJu/yhAY1HoxVbQ97ZNR/X7KE
YRhaFOrvXo+KBxEqoNN3d6rUfi97NNfx68ywItQ0yODb/ak0xpgRha/00qj5C6gmhgA4jafdyKcZ
BtNpRwaZkI+suuBbOzVUvKWSwG3+o2KmlrbWJHytcJx/eVjCYIzp5P3f1crkV1UMAQg8FgeX0KxA
1CTvRi0g03GXi3MjLmz+dXgnB4OPY2mUXpNvFl9KkEZGGPGbTd10v9peItTsmBt8NJ3GGNPZF360
rZVPWzWyAMuerx43TtS/iQDhIjNrLsR8Jb7PegfHwKXv1JV9p66gS9JjImOSMgtpDRO7pi2cTDUQ
m2cftGz0/FuFpKsInw/SsdN4tfXPKMTkBe3fl1A7qbGqa0OE+jAmTMZdVpqSRRdEHJne2oBYzYQv
aVirELAn1Ce4evYeHTt36dLF28vduZGtpbGuJg9ERTkpseHPAq+eOrDvTHB6/SumTCAog2hDBAJB
vZBYCoFAUDNEDBEIBDVDxBCBQFAzRAwRCAQ1Q8QQgUBQM0QMEQgENUPEEIFAUDNEDBEIBDXzJVSW
5whsLM3cbYxs9TQNNDiMSJiRk/sqLiUkU1jdVW0cDf1WTjbuFromAkpUWvwuOTUwJqO6Occ10giB
QHiPAjGEtMf/PGVbcy5COOzcnjbnMiQAAJy+42cEdNREgDPvHrPZEytnQw2u9co/R/1qSQHgrKfn
XLZGZH1ykUVKw6VF87HtnL9tYeOoTVVcJIWZrOSYPVcerHyYmq3CajukZT66f7fFne3tBejjpiRF
GWeu3f718tt34jppBGlPnDPtn+YcAObF2d2eZ1XuWALhv0tNG2WSlC3X4gsxACBjj7YTLD91iSXH
eskf01793PMXL1unyjIIABBlYuP0y7gxL+d6d9Gt4ipcY5edC0fv6dmwYXnxAQBcHbNB3w1+Ordj
x6o2YauRRggEQmVq3DeEEx49PpSDAQBxLX7q1fATi94gDRtD3nvxg7EkLSXpemjUoQevjj2PD8sT
szW+EWXZ1DtgTueOSuofadj8OfObsVZsnfiyvOSj1+7/fvzuittvXhVjDAAImTTxPj3JvTGnlhsh
EAjyqAXfkPDdhhspPwy21kDI2strVEDc1k82zLAkOfb1vnsvDoUkRBR+VNmc0nRv1+nf4S09tREA
6Nh57er/1uNIYomcJjit+/rObsBFAICZ5Bc3+m4PfiErBb7wTMNlM7//pbGAAmTaotvGjrHf3C1g
aqsRAoEgn9qIlOHX94LOSA0zge1MH+tPKl3LpMaEzli5vfGfZxfcjSsngwCAKQ19cNVn09MIqY8a
UY6d2nwnT+9Cei6/dzOVbgwnyQod+88H8QEAdEHc/K03rxZjAACk0atP63a82mqEQCDUpRgCXPxm
7Z0sMQYAysG77QC96jchiVuw4fKmqAIlu5DkRz9cEcZuxIU0bHs2rqzZIZvWrr5a7O6Uty89uFlp
E3Am+6X/nRypNOOYNf/RhVc7jRAIBIXUUt4Q/fzWk2tCAABKy2FOV7NayQvApQ9eZ7KxKcS3N9Wu
+DBIq5e7jYZUgJTFH35WKM9WokMeR0bS7Pm+HjaC2miEQCAoprbSF5m8iDUP8mgAQBy3rm1610oJ
fVxQ/GFbUqpyPI1n1akRV/pXSXK8ov3LJakJgQWsUmXm2MCFUwuNEAgExdReFrX43rXgByIAAErf
Za63YS1cCZkZyDQgLEnKKq6gp3DMzFuwfimcEZ+usJg3nR6cSGPZT5pr1HwjBAJBCbW4mIPJDFv7
pIgBAMTr4OPZkV/TF0A6nV1MpOYeFiffelsxd5BrYmgnk1KJmQUK9xDGovgsIavJcPQdjKgab4RA
IChBgdMGl5zYtydQAAhAWJQnWzNB3zm5z+UiQgCSkoKqs46x8NLVkBdenTy4wDFtMbfNo8D7RTW4
cb3AruXkJlJzCWeHhp3Jr3iCoaGO1LMMmEnLVbKVufSoDgUAlI6NAYLEGm7k4449tnvnPQHA53Qs
gfBViCHA+TnZleY1FOflROdVo3VJ0vO1L1rvb6VJIQ1f35buj+49p2vqxk1nDG/dggMAgEWpa89H
VUpOQgZaGjKdRJJfyl5Yw6LZuh+7DG3ATYoInrnv4e0CDAAFJe99THxDLQqArtFGar5jCYT/ErVs
O+DiU5fDYmgAAJ61xxx3jRraP4vv9e03/o58BABY9CTg8oakyj4bpCHgyi5Hs6tgKZNZk/pNdtI3
1NRu0arTyTHNzBAA4DIx69YBhDT4XFTDjRAIBDWKIQDh2+C/o0UYAJD2973dHGvggsi6te+RPhZa
CABw+osboy6lC+Wdx/2wGIQWSUUE16yVNUf2V6RvZ9mEAgAQSWj84VdUjTdCIBCUUPuzBecfuByR
zAAAaDbynOn8ual9Bk6dz4xrbs8BACiODxr6b+gbBYaehJGJBcThS+WGJCM46b2wwDmxSZEMAACf
+16sgIRharwRAoGghLr4aBeGP9nyjsYAQOmP9G1m8xnmipZt2xPT27XWAAAQZbwa+/ftO8WKzsXC
MolMWHA0pE4wJmvDP+fXhWXEZmUF3r/Z/0BklnTNCU8mQTAWiiS4hhshEAjKqJOyZ0zWziuv50xx
MUGg26zNT/Yv58d9iqeab+F28Oeu3XURAEhyX09dd/FEjpLJjvNKhDKdhKuvyZH6jIXpkXPXRc4t
f6qulkAmG0W5JUxNN0IgEJRRRy6MrJDHO1MZDIA4xuN9nUyqrxBxjZvumu3b34hCAExh3K/rz+1M
q0KW5eYWlcjqgVgY6ihObKYs3x9lipLzcI03QiAQlFBXnlRJ2parcdJV9yat2o6rZjk0joHjljn9
RphxEABTnLh4w+kNCVWn10iycuNZpQQ1MNVTKEEQ386EDeFhOj8mh6nxRggEghLqLKCDkx4HHcyW
lkOznFqdcmiUXqO1c/pPsOIiAKY0ZcXGk8velqmibNAZ6S/ZEBoyszW3VST6OOaeDVi3zkc/qclG
CASCsjled5cqi99wI7kUAwCyadt2pLFKChHSsftr1vczGvAQABamb9h8fGF0qaoGjzglMI51FXNt
7DookHxcywYd9Vg9JjMmMYquhUYIBIJi6jK9BccEPj5VgAEAadjN9LGqshoG0mrg//OAuQ35CACL
MrZuPfpLeEk1nC645OrzZKnihAR2I1rpyntaqqVX06Yc9vwrz5OEtdEIgUBQTJ1m2eHimLW3M6Xl
0Bw7eSkvh4Y0rX6fMXChgwYFgMVZO7cfmxlWXE2PC058+uKqtEoZ4nXt06F7pXojlLGrfxcjabyQ
zgzfEyGunUYIBIJCODo6OgihoqKiOrkcTk8Ve3Zu0oQHiGfYUPh6d7SCyLbAfM6MoX+5aFEAmCm6
fPz87xFl2poCPcX/6VBMobhSY2U5cVpOYxy1OQgoLfNuNsJboSlpshWlHL2Gf83oO9pcWmRaeO3k
hRWx8rxONdIIgUBQADI3N0cIpaWl1dUVud7DJ9zuacgBYPLCvv39woVSeSc16BTi37FFdYqHFQed
Mt4WLadorEaDVQuHzbVhF3mJ8lPOPo0Ly6P1TK36tW7sooXYNfovr7bbEKIoIbtmGiEQCPKoY20I
AJikNOjSubE9F5DAyDo/Yn+ssLLuQOnbTepia14dk1GcHLkqOFuOBJAU3HmZa+fq4KZDIQCOhm6z
Rrbdmtp1sDcy5SEEABhnxzwYsPlxiJLC1zXSCIFAkIcaVmAymS/XPCmUlkPr6OPZgV/rVxRnRoxd
sm/cjXfvKm2HKinOPHX2RJtV9+4W10UjBAKhMnVvlAEAcG06PPbv3IoLgIXnd/7T/0Fx3ThTOJoG
nk42Hha6xgIkLi2JS0kJfJ2RJlZDIwQC4T3qEUOAtIb9NOlga00KQJwY2MY/MJT4UwiErxU1lcXB
JaevvHgtLYdm03KOmwYpEkYgfLWorTpXWWzwhki2HNoAX1cHUiaMQPhqMTc3t7CwIP1AIBDUBVFC
CASCmiFiiEAgqBkihggEgpohYohAIKgZthZ1NgAHgProPyT7vzSU/j6griSyToLuBALhE/hQEt/g
8xoia8oJBMKnQYwyAoGgZogYIhAIaoaIIQKBoGaIGCIQCGqGiCECgaBm/g8eFUAYIq0SnwAAAABJ
RU5ErkJggg==
--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


This is with `display-raw-bytes-as-hex' t.

> (What about printing it as "\u0080" instead? Only affects the C1 controls=
...)

\u0080 is a character, so that's something different than \200.

> Can't say my intuition is any more reliable. On the other hand,
> intuition says that counter-intuitive solutions are very attractive,
> but I think it's just trying to confuse us.

=F0=9F=98=80

--=-=-=--




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 1 Oct 2022 05:34:39 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Oct 01 01:34:39 2022
Received: from localhost ([127.0.0.1]:43431 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oeV94-0000sZ-NK
	for submit <at> debbugs.gnu.org; Sat, 01 Oct 2022 01:34:38 -0400
Received: from eggs.gnu.org ([209.51.188.92]:58348)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1oeV92-0000sL-PL
 for 58168 <at> debbugs.gnu.org; Sat, 01 Oct 2022 01:34:37 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:38064)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oeV8u-0006uP-PZ; Sat, 01 Oct 2022 01:34:31 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=B5qh+A2afNXJgbubwcL+5lO7SZTNmwEKtSj8HVT1ajg=; b=mkgdoyUDsTGB+fZo7npA
 KbHvr+yUKT87FqH+KZucXa/xFQtiG1qeLRD80WUD/Z4z15YExIL9r87SCrVl7AaoqzJsKRzfIPU7x
 H+I1yhOjsRN5UyEGgC0OH0WzzO+zbcDabCeEpLDCUu6cK4w56XrT64hYsPiQAD0k2aIqtaX50IAdS
 2nYy2uDUeKUMIDTvKoosiMdMVK/GU6wJeLlKwDFnBM1ay9OIpuHfLO6DW0V6NbIar4CFFjhsavGJ9
 5EfYB5/6sewPUzl9G3jQ438aRkXnCNUn5q2krvdapCiEWokIT7LFdQ/qMraj+o2OjUQiNvbNcr6m5
 3scEbTXfT8/YbQ==;
Received: from [87.69.77.57] (port=4681 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oeV8s-0001wH-PD; Sat, 01 Oct 2022 01:34:27 -0400
Date: Sat, 01 Oct 2022 08:34:15 +0300
Message-Id: <83edvscdk8.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN> (message from
 Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Fri, 30 Sep 2022 22:12:24 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN> <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, larsi@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Cc: 58168 <at> debbugs.gnu.org
> From: Mattias Engdegård <mattias.engdegard@HIDDEN>
> Date: Fri, 30 Sep 2022 22:12:24 +0200
> 
> 30 sep. 2022 kl. 15.52 skrev Lars Ingebrigtsen <larsi@HIDDEN>:
> 
> > (string 4194176)
> > => "\200"
> > "\x80"
> > => "\200"
> > 
> > which are kinda equal in some ways, and not in other ways.
> 
> And (string 128)
> => "\200"
> 
> but painted in a different colour

It's the same color as the result of '(string 4194176)'.

>so that the alert reader suspects that something odd is going on...

Which it is.

> (What about printing it as "\u0080" instead?

NO!!  \u0080 is something entirely different.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 1 Oct 2022 05:31:17 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Oct 01 01:31:16 2022
Received: from localhost ([127.0.0.1]:43422 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oeV5o-0000nt-JV
	for submit <at> debbugs.gnu.org; Sat, 01 Oct 2022 01:31:16 -0400
Received: from eggs.gnu.org ([209.51.188.92]:43244)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1oeV5m-0000nf-Sb
 for 58168 <at> debbugs.gnu.org; Sat, 01 Oct 2022 01:31:15 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:51188)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oeV5h-0006Vs-KI; Sat, 01 Oct 2022 01:31:09 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=NgqxD2xFykViA0IoNmdN1eVs6ORvY0a+CQpmC9SEQhw=; b=NccqViwkMa7qYTD5rSnq
 382mFK2uhJYJTMFKK34ogK9stn2EhYUxvpjP4JYwl/jVrT4AIOXVDdmpdQWrN05OgS8UaZa6QLpo4
 dU4yMHpW2BoFD3pij04rfDoSHE58Cgd9J0ndd4u3IFN6MJGLnrJL6J3fubVx4bWO3yw/2q0Y2YWa2
 9ARshlag8Zv1YYdn8/j2hQMYbzVPXBRt2xQbfG0hcmldowjggWhjIIW4qoxpwn2VfP/jT8za+j58x
 QJwPp50yyxdMcbpLdlG8fTfXqh0Vgp4MSDYg0a+V3ALVc/Y+C6lrrcFnWTVoOZPgCNm/wLVjNTZ2m
 5Sj4sJPRkgvU5A==;
Received: from [87.69.77.57] (port=4480 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oeV5g-0002fd-99; Sat, 01 Oct 2022 01:31:09 -0400
Date: Sat, 01 Oct 2022 08:30:56 +0300
Message-Id: <83fsg8cdpr.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Lars Ingebrigtsen <larsi@HIDDEN>
In-Reply-To: <877d1l55rn.fsf@HIDDEN> (message from Lars Ingebrigtsen on Fri, 
 30 Sep 2022 15:52:12 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org, mattias.engdegard@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Cc: 58168 <at> debbugs.gnu.org
> From: Lars Ingebrigtsen <larsi@HIDDEN>
> Date: Fri, 30 Sep 2022 15:52:12 +0200
> 
> You also have
> 
> (string 4194176)
> => "\200"
> "\x80"
> => "\200"
> 
> which are kinda equal in some ways, and not in other ways.

So are "A" and "𝙰", or "א" and "ℵ": kinda equal in some ways, but not
in others.  That doesn't mean they should compare equal.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 1 Oct 2022 05:22:28 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Oct 01 01:22:28 2022
Received: from localhost ([127.0.0.1]:43417 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oeUxG-0000V1-T4
	for submit <at> debbugs.gnu.org; Sat, 01 Oct 2022 01:22:28 -0400
Received: from eggs.gnu.org ([209.51.188.92]:60414)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1oeUxB-0000Ui-NR
 for 58168 <at> debbugs.gnu.org; Sat, 01 Oct 2022 01:22:26 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:37346)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oeUx6-00054C-9e; Sat, 01 Oct 2022 01:22:16 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=H420c1Inrvvjmmlbqw+XwoTydAmeC6Zk2x4WVRZZVGg=; b=NvHLoOHVEMsbiYxoYVPe
 2b7PEXn2KCbIOqcMbDcS42QziA1mhxOrMVGf2IerENbrnmdjMvM/WitS0Wf0Fq5rLhMHMMq9ZHDaU
 vtVzXOvKIbGVfylQyNKqRb3J2gxPu8Aeu3iOvjBCb4TGmOCbZ5DLtdOid62Z6EMGwciVYHrRbE8PD
 50WmI8r/0rQoTqhMbAP2+z2GBf78OByfBxCVZF2cUdXOzU0/+Gvc1SZzH768LZQqZfNjuaLv+zAdm
 0VHy+6ol/Tdtt1kxVOi/FcPEN5xZsYWY99LmGX44ONrbm45qwX0qBOMUdqTM5fvtOPrFNH9/fHs9w
 XiQpoaMEGeAtyA==;
Received: from [87.69.77.57] (port=3937 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1oeUx5-0005U5-O7; Sat, 01 Oct 2022 01:22:16 -0400
Date: Sat, 01 Oct 2022 08:22:03 +0300
Message-Id: <83h70oce4k.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <6CB805F6-89EE-4D7C-A398-F29698733A42@HIDDEN> (message from
 Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Fri, 30 Sep 2022 22:04:47 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <83czbef6le.fsf@HIDDEN> <6CB805F6-89EE-4D7C-A398-F29698733A42@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Mattias Engdegård <mattias.engdegard@HIDDEN>
> Date: Fri, 30 Sep 2022 22:04:47 +0200
> Cc: 58168 <at> debbugs.gnu.org
> 
> 29 sep. 2022 kl. 19.11 skrev Eli Zaretskii <eliz@HIDDEN>:
> 
> > Unibyte strings should never be compared with
> > multibyte, unless they are both pure-ASCII.
> 
> It's perfectly fine to compare "Madrid" (unibyte) with "Málaga" (non-ASCII multibyte).

Not relevant: I meant unibyte non-ASCII strings.  The ASCII case is
easy and un-problematic, and is really just a straw-man here.

> If you mean that all strings (literals in particular) should be multibyte by default then I agree and at some point we should take that step, but it would be quite a breaking change. Perhaps less in practice than we fear, though...

That's not what I meant.  I think unibyte strings are with us for the
observable future.

> > Unibyte characters don't belong to this order.  They
> > should be converted to multibyte representation to be sensibly
> > comparable.
> 
> Oh I agree to some extent but we can't really raise an error if someone tries so we might as well return something reasonable and coherent.

It depends on the use case, but in general I see no problem with
signaling errors when we cannot produce reasonably correct results.
For example, string-to-unibyte does signal an error in some cases.

> Besides, there are more good reasons for ordering strings (both multibyte and unibyte) than might be apparent at first.

Examples, please.

> Working from the assumption that we can't change string= to equate raw bytes in unibyte and multibyte strings, we need to invent an order between normally incommensurate values

I don't agree with the conclusion.  It is not the only possible
conclusion.  Signaling an error is another one, and I'm sure we could
think of more.

> It's also a matter of performance -- string< has been improved recently but currently we compare text in Latin and Swahili much faster than French and Arabic; it would be nice to close that gap. UTF-8 is designed so that comparing strings by scalar values can be done byte-wise, but the way we encode raw bytes make them sort right between ASCII and Latin-1. Given that the specific order doesn't matter much, we could just run with that.

I see no reason to make comparison of unibyte and multibyte strings
perform better.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 30 Sep 2022 20:12:34 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Sep 30 16:12:34 2022
Received: from localhost ([127.0.0.1]:43030 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oeMN8-0003QT-Fl
	for submit <at> debbugs.gnu.org; Fri, 30 Sep 2022 16:12:34 -0400
Received: from mail-lf1-f51.google.com ([209.85.167.51]:37883)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1oeMN6-0003QF-Hj
 for 58168 <at> debbugs.gnu.org; Fri, 30 Sep 2022 16:12:32 -0400
Received: by mail-lf1-f51.google.com with SMTP id k10so8481868lfm.4
 for <58168 <at> debbugs.gnu.org>; Fri, 30 Sep 2022 13:12:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject
 :date; bh=As+zREjW/aRrpTtKbTJvTSMiiB/kaAcKdQ7p1QRkpcI=;
 b=dAbeVnsaoKTyszElG+5NpDPIQ42ZH9opPLinxJUfx8gijui8V2aSnGtENL/NFDfHdQ
 QTw8adSiBjAaU3m/atAYIfpoRiM2Ok1TmXXL8VuW2uormOeUpBZf7CKHg61l9eYHHQ8a
 8RlOELfml8yEVwsEqALYKuYwGyxa9WhVkNV5DeNvdIkEmHZgKMz8Qamvw/rsZ8wohZZb
 IFDu9rUbKwo2xtAq0OMLyM1JcKkHsPBadRX/n3CNWK2YdTGYDFCNk8QAfuoEKk7pHuWC
 b33WAgfjQFCF4LTxWuIhTPFJ4ap8Me9k8OFLYjs17U0BRa617e9Zsc7DIOaBox3RG+b5
 rQpw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:x-gm-message-state
 :from:to:cc:subject:date;
 bh=As+zREjW/aRrpTtKbTJvTSMiiB/kaAcKdQ7p1QRkpcI=;
 b=oL23HSJcueSWlpili1hCm+HlGn3STxMy0xwrKuuhGKtW/G7IuQ2mjoaCgD3JAIWKYw
 lcBiCrcHjdVC89CH1zLYAhitpTUX4lpnKW0npsi5gp92P/Yb0OtDnHYXRdksdAUhCVlB
 o24e4EDwLqkBC8Ik45D+w97nkUtq/ZQ3GNH3mHGX05P+9Bn47w+Rkjfon3MMjWw1o2lY
 MOoMpLi3kKIVPCmW9LtghDdGsWRCNYtQM25H/u15ItJukBs6k9NY7YzpT7SCmlX5z8fh
 y6hLB/hwnshv8LkdsPfMAST7m2jfBETu7ehxxFAByE79fcjuzO52HvsdoUGn4h5tk6Li
 8gnw==
X-Gm-Message-State: ACrzQf27FbGS+r2t4P3XWKVoDMpn7Gm/8ZWGDgbTELo2T5iQi+Ci8eVp
 Dw5+kZbw7byfBdG97GH9B8U=
X-Google-Smtp-Source: AMsMyM49XmICGJn9wAs47+0spfsuBucR1yBUR8PpH6p7ahhsqb7b3/s0pCwIBmiLW36a/CJT2dmFaA==
X-Received: by 2002:a05:6512:ea3:b0:4a2:1ae8:d0e9 with SMTP id
 bi35-20020a0565120ea300b004a21ae8d0e9mr743051lfb.193.1664568746510; 
 Fri, 30 Sep 2022 13:12:26 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 d7-20020ac25ec7000000b004949a8df775sm414472lfq.33.2022.09.30.13.12.25
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 30 Sep 2022 13:12:25 -0700 (PDT)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <877d1l55rn.fsf@HIDDEN>
Date: Fri, 30 Sep 2022 22:12:24 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <469814C2-197A-4BCA-8E2A-245577340C1E@HIDDEN>
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <877d1l55rn.fsf@HIDDEN>
To: Lars Ingebrigtsen <larsi@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

30 sep. 2022 kl. 15.52 skrev Lars Ingebrigtsen <larsi@HIDDEN>:

> (string 4194176)
> =3D> "\200"
> "\x80"
> =3D> "\200"
>=20
> which are kinda equal in some ways, and not in other ways.

And (string 128)
=3D> "\200"

but painted in a different colour so that the alert reader suspects that =
something odd is going on...
(What about printing it as "\u0080" instead? Only affects the C1 =
controls...)

> I think A makes the most intuitive sense, somehow.  But perhaps my
> intuition is off.

Can't say my intuition is any more reliable. On the other hand, =
intuition says that counter-intuitive solutions are very attractive, but =
I think it's just trying to confuse us.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 30 Sep 2022 20:05:00 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Sep 30 16:04:59 2022
Received: from localhost ([127.0.0.1]:43016 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oeMFn-0003F7-IP
	for submit <at> debbugs.gnu.org; Fri, 30 Sep 2022 16:04:59 -0400
Received: from mail-lj1-f179.google.com ([209.85.208.179]:40650)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1oeMFj-0003Es-UQ
 for 58168 <at> debbugs.gnu.org; Fri, 30 Sep 2022 16:04:58 -0400
Received: by mail-lj1-f179.google.com with SMTP id g20so5891124ljg.7
 for <58168 <at> debbugs.gnu.org>; Fri, 30 Sep 2022 13:04:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject
 :date; bh=TgCBP6TvfS+mIRfqy1Q9IBD4YBC1ohxvxtsgGWdt1A0=;
 b=B3yEccpmedKPdShkNt/hFL2jNv+RJM7DyOEqBcklhW2pTIOkXndLJYIJn/hZ67AR93
 9JYJtGf3GqwcsN/HZmMXg05oRw5pRlTvECYwOzP2WGFZUlOFRz/JOY+55/tKWOx291ms
 MhFiwAj/lgc1c8jFRqZ1FVXn0PZaIvBroeIrMJw93UMKoGN8t/jFsXuAVzY1ZR5Li9eo
 5Dq/btFlDbakzxhAYrDvfcXOuQfqNO3O59i1eL6sElvA7mNCAYCV1DldUuD7gQq5OOnu
 WEkDLHbhpHHOY00dZ2Xtpkcj2Voo7xfkiDR7Heyae3On4lY2LIz8dYnn4TQx919PwsRg
 psYw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:x-gm-message-state
 :from:to:cc:subject:date;
 bh=TgCBP6TvfS+mIRfqy1Q9IBD4YBC1ohxvxtsgGWdt1A0=;
 b=Qt/XNSFoi1hC/0DwibgcE0wMlOEvcPUXfAIh3HJ29rCEZTtyCGb2mhuEMicWttg8RP
 f8fjPZCiPeDR7lRc0rlIQaSDRKZ9XJRxfEbKVsd060KZt/0Cwz5rTMj386Up9JKLbHSD
 8UgUsAlpUdf7fvbFY1c0eoClXRHa3ePCaXSdCQHGJ0NRRtxifEHcjZgp/PFnas19nFbu
 +MVnB36MtqTy32Mbb7IcDtHYEegJ4CcK+4HklH6H35wmrLcLlrc+HMU+59smNJi/rs2O
 rnvAVEZSDJn0xzyuxAgeVVziswW+EXNhUs6YqVny8nyb2zOd6F5qNSdt537twyO62hNm
 5vPg==
X-Gm-Message-State: ACrzQf1/dxd4dD6O/9cdwcOlxiIR0+30YRHZRXq7/NoaQS4eyixQ7mhv
 rVmwCPO12sZFxwc/AF2aTcQ=
X-Google-Smtp-Source: AMsMyM6pBq5TwpNtsTVUvHInIZBbYoOuJBkQXuNuwVShQjGOUpgsOSC/9vcBhCvqAYMAPKcdfL6ddw==
X-Received: by 2002:a2e:2205:0:b0:26c:2423:a508 with SMTP id
 i5-20020a2e2205000000b0026c2423a508mr3466604lji.163.1664568289576; 
 Fri, 30 Sep 2022 13:04:49 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 b18-20020a056512305200b00498f0434efdsm411751lfb.19.2022.09.30.13.04.48
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 30 Sep 2022 13:04:48 -0700 (PDT)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <83czbef6le.fsf@HIDDEN>
Date: Fri, 30 Sep 2022 22:04:47 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <6CB805F6-89EE-4D7C-A398-F29698733A42@HIDDEN>
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
 <83czbef6le.fsf@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

29 sep. 2022 kl. 19.11 skrev Eli Zaretskii <eliz@HIDDEN>:

> Unibyte strings should never be compared with
> multibyte, unless they are both pure-ASCII.

It's perfectly fine to compare "Madrid" (unibyte) with "M=C3=A1laga" =
(non-ASCII multibyte).
If you mean that all strings (literals in particular) should be =
multibyte by default then I agree and at some point we should take that =
step, but it would be quite a breaking change. Perhaps less in practice =
than we fear, though...

>> So, what can be done? The current string< implementation uses the =
character order
>>=20
>> ASCII < ub raw 80..FF =3D mb U+0080..U+00FF < U+0100..10FFFF < mb raw =
80..FF
>>=20
>> in conflict with string=3D which unifies unibyte and multibyte ASCII =
but not raw bytes and Latin-1.
>=20
> It would be unimaginable to unify raw bytes with Latin-1.  Raw bytes
> are not Latin-1 characters, they can stand for any characters, or for
> no characters at all.

Completely agreed! Let's try to fix that, then.

> Unibyte characters don't belong to this order.  They
> should be converted to multibyte representation to be sensibly
> comparable.

Oh I agree to some extent but we can't really raise an error if someone =
tries so we might as well return something reasonable and coherent. =
Besides, there are more good reasons for ordering strings (both =
multibyte and unibyte) than might be apparent at first.

Working from the assumption that we can't change string=3D to equate raw =
bytes in unibyte and multibyte strings, we need to invent an order =
between normally incommensurate values which sounds odd but is actually =
fine; this is occasionally done and can be quite useful.

It's also a matter of performance -- string< has been improved recently =
but currently we compare text in Latin and Swahili much faster than =
French and Arabic; it would be nice to close that gap. UTF-8 is designed =
so that comparing strings by scalar values can be done byte-wise, but =
the way we encode raw bytes make them sort right between ASCII and =
Latin-1. Given that the specific order doesn't matter much, we could =
just run with that.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 30 Sep 2022 13:52:24 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Sep 30 09:52:24 2022
Received: from localhost ([127.0.0.1]:40908 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1oeGRE-0003dR-7i
	for submit <at> debbugs.gnu.org; Fri, 30 Sep 2022 09:52:24 -0400
Received: from quimby.gnus.org ([95.216.78.240]:55526)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <larsi@HIDDEN>) id 1oeGRB-0003dD-O0
 for 58168 <at> debbugs.gnu.org; Fri, 30 Sep 2022 09:52:22 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org;
 s=20200322; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID
 :Date:References:In-Reply-To:Subject:Cc:To:From:Sender:Reply-To:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
 List-Post:List-Owner:List-Archive;
 bh=1RrUKv+wqa9EIkdIpNfZlipAlYQFaFmRKy2jfyDNukk=; b=mFE1knDeR2z58UJwYzSuKdeNmt
 mOnmybaDpHS2VhQ941V50H72j8k68Q/xf6z0q7ydJh8qCasSYGAXNuNh2yufc/zYyccCq7AEUEnLx
 p5pSIXAm2+y58ELc4Hr+gbA/CanwntwH8uWuqyJ0QHU3U2Vm3bN08M29lQ7CZN/T41TI=;
Received: from [84.212.220.105] (helo=downe)
 by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.92) (envelope-from <larsi@HIDDEN>)
 id 1oeGR3-0000e2-5h; Fri, 30 Sep 2022 15:52:15 +0200
From: Lars Ingebrigtsen <larsi@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
In-Reply-To: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN> ("Mattias
 =?utf-8?Q?Engdeg=C3=A5rd=22's?= message of "Thu, 29 Sep 2022 18:24:04
 +0200")
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
X-Now-Playing: Nilotika Cultural Ensemble's _L'Esprit de Nyege 2020_: "We
 Love Nilotika"
Date: Fri, 30 Sep 2022 15:52:12 +0200
Message-ID: <877d1l55rn.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 @@CONTACT_ADDRESS@@ for details.
 
 Content preview:  Mattias Engdegård <mattias.engdegard@HIDDEN> writes: >
    We really want string< to be consistent with string= and itself since this
    is fundamental for string ordering in searching and sorting applications.
    > This means that for any pair of strings A and B [...] 
 
 Content analysis details:   (-2.9 points, 5.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Mattias Engdeg=C3=A5rd <mattias.engdegard@HIDDEN> writes:

> We really want string< to be consistent with string=3D and itself since t=
his is fundamental for string ordering in searching and sorting application=
s.
> This means that for any pair of strings A and B, we should either have A<=
B, B<A or A=3DB.
>
> Unfortunately:
>
>   (let* ((a "=C3=BC")
>          (b "\xfc"))
>     (list (string=3D a b)
>           (string< a b)
>           (string< b a)))
> =3D> (nil nil nil)
>
> because string< considers the unibyte raw byte 0xFC and the multibyte cha=
r U+00FC to be the same, but string=3D thinks they are different.

You also have

(string 4194176)
=3D> "\200"
"\x80"
=3D> "\200"

which are kinda equal in some ways, and not in other ways.

> It suggests the following alternative collation orders:
>
> A. ASCII < ub raw 80..FF < mb U+0080..10FFFF < mb raw 80..FF
>
> which puts all non-ASCII multibyte chars after unibyte.
>
> B. ASCII < ub raw 80..FF < mb raw 80..FF < mb U+0080..10FFFF
>
> which inserts multibyte raw bytes after the unibyte ones, permitting any =
ub-ub and mb-mb comparisons to be made using memcmp, and a slow decoding lo=
op only required for unibyte against non-ASCII multibyte strings.
>
> C. ASCII < mb U+0080..10FFFF < mb raw 80..FF < ub raw 80..FF
>
> which instead moves unibyte raw bytes to after the multibyte raw range. T=
his has the same memcmp benefit as alternative B, but may be slightly faste=
r for ub-mb comparisons since only unibyte 80..FF need to be remapped.

I think A makes the most intuitive sense, somehow.  But perhaps my
intuition is off.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at 58168 <at> debbugs.gnu.org:


Received: (at 58168) by debbugs.gnu.org; 29 Sep 2022 17:12:17 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Sep 29 13:12:16 2022
Received: from localhost ([127.0.0.1]:39498 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1odx56-0002z0-HL
	for submit <at> debbugs.gnu.org; Thu, 29 Sep 2022 13:12:16 -0400
Received: from eggs.gnu.org ([209.51.188.92]:38338)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eliz@HIDDEN>) id 1odx55-0002yo-HL
 for 58168 <at> debbugs.gnu.org; Thu, 29 Sep 2022 13:12:15 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:54318)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1odx4y-0001u7-Ae; Thu, 29 Sep 2022 13:12:10 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=wotfmTeu58spxbosYjk4MxCJTyw7QrztxzfY8+dxR6I=; b=Eh/NuNlraO6wZO1i2Lmq
 V7AC+Khe4J033NVL3nHOD1it1qNerZ3eWHAX13/C/68cvggtvSdj3XsjnP7bjkgjxT5e846CTW5tz
 5gEfEIphYmnLlhCv5/X2mOnQiPg9k+8c4NCxJppDfyUX9oCV4vlAW8I8gZJCFA5PDv1PFOwd9jbZS
 yIsSS1ja/u5V8FsUtYvTEI82cykntdk+3R40M+N9kyjE8KAi5BSxtUBaLdhTfeWDbuQ4epLHtjjdx
 Fzwnf5tF9zSHOKyhdN75h5JD+mNQjfro3qREj92VqxfdSOdA9lvU+ZUgtbb6B92HrzHkOx7JBa+q8
 OIzbMfLpjYJGfA==;
Received: from [87.69.77.57] (port=2701 helo=home-c4e4a596f7)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1odx4x-0003u6-NW; Thu, 29 Sep 2022 13:12:08 -0400
Date: Thu, 29 Sep 2022 20:11:57 +0300
Message-Id: <83czbef6le.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN> (message from
 Mattias =?utf-8?Q?Engdeg=C3=A5rd?= on Thu, 29 Sep 2022 18:24:04 +0200)
Subject: Re: bug#58168: string-lessp glitches and inconsistencies
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 58168
Cc: 58168 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Mattias Engdegård <mattias.engdegard@HIDDEN>
> Date: Thu, 29 Sep 2022 18:24:04 +0200
> 
> We really want string< to be consistent with string= and itself since this is fundamental for string ordering in searching and sorting applications.
> This means that for any pair of strings A and B, we should either have A<B, B<A or A=B.
> 
> Unfortunately:
> 
>   (let* ((a "ü")
>          (b "\xfc"))
>     (list (string= a b)
>           (string< a b)
>           (string< b a)))
> => (nil nil nil)
> 
> because string< considers the unibyte raw byte 0xFC and the multibyte char U+00FC to be the same, but string= thinks they are different.

Why do we care?  Unibyte strings should never be compared with
multibyte, unless they are both pure-ASCII.

> So, what can be done? The current string< implementation uses the character order
> 
>  ASCII < ub raw 80..FF = mb U+0080..U+00FF < U+0100..10FFFF < mb raw 80..FF
> 
> in conflict with string= which unifies unibyte and multibyte ASCII but not raw bytes and Latin-1.

It would be unimaginable to unify raw bytes with Latin-1.  Raw bytes
are not Latin-1 characters, they can stand for any characters, or for
no characters at all.

> It suggests the following alternative collation orders:
> 
> A. ASCII < ub raw 80..FF < mb U+0080..10FFFF < mb raw 80..FF
> 
> which puts all non-ASCII multibyte chars after unibyte.
> 
> B. ASCII < ub raw 80..FF < mb raw 80..FF < mb U+0080..10FFFF
> 
> which inserts multibyte raw bytes after the unibyte ones, permitting any ub-ub and mb-mb comparisons to be made using memcmp, and a slow decoding loop only required for unibyte against non-ASCII multibyte strings.
> 
> C. ASCII < mb U+0080..10FFFF < mb raw 80..FF < ub raw 80..FF

Neither, IMNSHO.  Unibyte characters don't belong to this order.  They
should be converted to multibyte representation to be sensibly
comparable.

> Otherwise, I'll go with B or C, depending on what the resulting code looks like.

Please don't.  Let's first decide that we want to change this, and
what are the reasons for that.  Theoretical "impurity" doesn't count,
IMO.




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 29 Sep 2022 17:01:19 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Sep 29 13:01:19 2022
Received: from localhost ([127.0.0.1]:39465 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1odwuV-0002hd-9g
	for submit <at> debbugs.gnu.org; Thu, 29 Sep 2022 13:01:19 -0400
Received: from lists.gnu.org ([209.51.188.17]:40498)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1odwuS-0002hU-JA
 for submit <at> debbugs.gnu.org; Thu, 29 Sep 2022 13:01:17 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:39582)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <mattias.engdegard@HIDDEN>)
 id 1odwuS-000586-2R
 for bug-gnu-emacs@HIDDEN; Thu, 29 Sep 2022 13:01:16 -0400
Received: from mail-lf1-x134.google.com ([2a00:1450:4864:20::134]:45782)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <mattias.engdegard@HIDDEN>)
 id 1odwuN-0000J6-Hk
 for bug-gnu-emacs@HIDDEN; Thu, 29 Sep 2022 13:01:15 -0400
Received: by mail-lf1-x134.google.com with SMTP id g1so3137417lfu.12
 for <bug-gnu-emacs@HIDDEN>; Thu, 29 Sep 2022 10:01:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=message-id:in-reply-to:to:references:date:subject:mime-version
 :content-transfer-encoding:from:sender:from:to:cc:subject:date;
 bh=I1QVLZ6EDNJFXTG4PaY/G9B/TSS/WQriObqcfVYaW5M=;
 b=YxIk2aLtl1RUf5SsHDBixgyYK3FhFCjqjsX18g+UXn32z0Zlrwr3GKMl5hrzZd+YUR
 ORhKGHP/RDJZuS40G68bWS8H0pyPY79h85OShF3vMT6A98bFhEuBcIrpL5EJhquiKcfd
 NiYAb2GaG0ZmORdQRd6CzC5xuY6ObZ9lb87VLKw80mPlWquO90XW4LvMMFIqbZrMduuE
 OerICLzm2dysGzpPn5pRzWEH6QpVL50oulgKSkj3bKiukuuG4JTWZP4yyiYlNhnqI+OS
 MzgxFxYFJxYqcoufrhSuCNj5O+jJo1QJ6QZhUl745aW0bA/z26OsqpQwUI5BHV4b2Auz
 CKVA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=message-id:in-reply-to:to:references:date:subject:mime-version
 :content-transfer-encoding:from:sender:x-gm-message-state:from:to:cc
 :subject:date;
 bh=I1QVLZ6EDNJFXTG4PaY/G9B/TSS/WQriObqcfVYaW5M=;
 b=q8FTPZ/tKtRETVNHI9fUXKWXtgKApcoTUXbbMvTiLcI9J0SqlRUi54CC+VeBJOwgzW
 aIIYzofaQ0NzhIhTaokhxP//sIG2XRStlpfeHl3WZR8N5qlBloSS9kdgRpfswJi0xEzE
 z6k+aR6EEzSQrjXjKgTST/PIElB9ZFhWGhuz68Z+n/wdCyVEH4ZPgDPXjKs7R+XBN8lU
 Di6WuJ/RDuU8AW978FfaIyJgZ7shUmasTs/ewur6RRhk84khV+lmUSDwBkbRz7MjQCCj
 /MPgdcFaQ7fxkuI07Cw+DiL2ZoEADtdDm9LXQJMWkcMjTfYbXM3djtzc/rtMnX6+wRXL
 upJw==
X-Gm-Message-State: ACrzQf0xOqDKuNm433Dy/l0mqklNcQZa9lKyOSnm/5GzaON/g75YKLNc
 EIB1CDl23nvPa6cd+o2SlvcrLDPSATNKEw==
X-Google-Smtp-Source: AMsMyM7dhdo9kOS8j6muo1eWiy6KIBckRrmupE6U5n/u2V/fE9D8zNN6vRZz51CmDMbhg6SX3lPxtw==
X-Received: by 2002:a05:6512:78e:b0:49a:d9ae:3051 with SMTP id
 x14-20020a056512078e00b0049ad9ae3051mr1734153lfr.203.1664470857693; 
 Thu, 29 Sep 2022 10:00:57 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 2-20020ac25f42000000b0049aa7a56715sm836848lfz.267.2022.09.29.10.00.56
 for <bug-gnu-emacs@HIDDEN>
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 29 Sep 2022 10:00:57 -0700 (PDT)
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: Re: string-lessp glitches and inconsistencies
Date: Thu, 29 Sep 2022 19:00:56 +0200
References: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
To: Emacs Bug Report <bug-gnu-emacs@HIDDEN>
In-Reply-To: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
Message-Id: <FF3BD1C6-A19C-404A-AFFB-D446FE5C4BBE@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
Received-SPF: pass client-ip=2a00:1450:4864:20::134;
 envelope-from=mattias.engdegard@HIDDEN; helo=mail-lf1-x134.google.com
X-Spam_score_int: -16
X-Spam_score: -1.7
X-Spam_bar: -
X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1,
 DKIM_SIGNED=0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001,
 SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.3 (-)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

29 sep. 2022 kl. 18.24 skrev Mattias Engdeg=C3=A5rd =
<mattias.engdegard@HIDDEN>:

> C. ASCII < mb U+0080..10FFFF < mb raw 80..FF < ub raw 80..FF
>=20
> which instead moves unibyte raw bytes to after the multibyte raw =
range. This has the same memcmp benefit as alternative B

Actually it doesn't -- editing mistake, sorry. Using memcmp for =
arbitrary multibyte strings requires collating raw bytes between ASCII =
and other Unicode.





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 29 Sep 2022 16:24:16 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Sep 29 12:24:16 2022
Received: from localhost ([127.0.0.1]:39379 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1odwKd-0001hA-OG
	for submit <at> debbugs.gnu.org; Thu, 29 Sep 2022 12:24:16 -0400
Received: from lists.gnu.org ([209.51.188.17]:55208)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mattias.engdegard@HIDDEN>) id 1odwKc-0001h2-2t
 for submit <at> debbugs.gnu.org; Thu, 29 Sep 2022 12:24:14 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:48556)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <mattias.engdegard@HIDDEN>)
 id 1odwKb-0003Uw-SR
 for bug-gnu-emacs@HIDDEN; Thu, 29 Sep 2022 12:24:13 -0400
Received: from mail-lf1-x134.google.com ([2a00:1450:4864:20::134]:35564)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <mattias.engdegard@HIDDEN>)
 id 1odwKZ-0001Nk-Oy
 for bug-gnu-emacs@HIDDEN; Thu, 29 Sep 2022 12:24:13 -0400
Received: by mail-lf1-x134.google.com with SMTP id z4so3024614lft.2
 for <bug-gnu-emacs@HIDDEN>; Thu, 29 Sep 2022 09:24:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:date:message-id:subject:mime-version:content-transfer-encoding
 :from:sender:from:to:cc:subject:date;
 bh=fRrCIJ7KPJ5ORKcyMvwTiCXyDh5QYMh4lIr4K3mvLOM=;
 b=iHbGCmaY3B90zN6Zwx6xDtMVcmjK6aJVVRqRWpQScCBHx/EWMUSYcQzuu/04r+YSJT
 snGMvb5+0DXyIg/jOaivT2T0XSSbu+fRj8yUkbsms7tqg//shZ9ujDtN/j8xjFifTqDt
 KrBFm0zATBWB3qbLK5s6ynfJHY0Q59kyrLzox1cutestp5q12lC2p6VTqMi8wh7dv1L2
 09EXEWpWEPWNlw9xnuLLJPDrOSQBBR5zrWrmdHi1/AfMLT2nj0iXbcDhu4kIyB4dUDJH
 jeLTe9EfkCUEzF5FljQMTGDZMYGpJsx3qOjcN5cHi4AByHcKuDo4nepAB3ocwal2186M
 U/vA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:date:message-id:subject:mime-version:content-transfer-encoding
 :from:sender:x-gm-message-state:from:to:cc:subject:date;
 bh=fRrCIJ7KPJ5ORKcyMvwTiCXyDh5QYMh4lIr4K3mvLOM=;
 b=s371LFuAhdUIsrUk/HfAUU+mUvlAf3C3CpmihmZAbMeFJUhTZ+wimRyUmEa3yIkunv
 xbTngnPrf3SJ0bUWkxu5wnYXtd9Lenx0TIDxKSk6QpeeH5pBSPiY6s4F6UxZmfGFssmc
 DTGHwWirWd3fUafzs0KpwRBYjvWf8jrvPuszkzNX/nsqevoLyL98eU17Ew5FCZ3jZl9C
 iF9Wzix6CYHkxJvSiiOdpJezLluOwcnfOvdorsQaAcWK99MqWFmT35+v81RdYIuSivMP
 sG5yxOnoH9Y2m8k8XiPqSP4Gr7VIHDCzBD+5IyO7Qk8lGhGGsut1SY/RarKmX6AlU0Px
 iN3A==
X-Gm-Message-State: ACrzQf36My4+jPAqiIr1AqLEX5yGqVrhq/eNkyeC0s7c83XccBgqSHvL
 VtdDsdQipxNMKfRAF3mtcDyvre1dLVYAPA==
X-Google-Smtp-Source: AMsMyM6DtucQvTMmG7Yl4PEdnaSmuneadpd9FA1o5C18o7HqqOZ+x80NoU9fZVWPufQNSYWMpsKqKA==
X-Received: by 2002:a05:6512:b1e:b0:4a1:ba8c:7ea7 with SMTP id
 w30-20020a0565120b1e00b004a1ba8c7ea7mr1718588lfu.608.1664468645276; 
 Thu, 29 Sep 2022 09:24:05 -0700 (PDT)
Received: from smtpclient.apple (c188-150-171-209.bredband.tele2.se.
 [188.150.171.209]) by smtp.gmail.com with ESMTPSA id
 v27-20020ac258fb000000b0049f54c5f2a4sm823071lfo.229.2022.09.29.09.24.04
 for <bug-gnu-emacs@HIDDEN>
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 29 Sep 2022 09:24:04 -0700 (PDT)
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Subject: string-lessp glitches and inconsistencies
Message-Id: <7824372D-8002-4639-8AEE-E80A6D5FEFC6@HIDDEN>
Date: Thu, 29 Sep 2022 18:24:04 +0200
To: Emacs Bug Report <bug-gnu-emacs@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
Received-SPF: pass client-ip=2a00:1450:4864:20::134;
 envelope-from=mattias.engdegard@HIDDEN; helo=mail-lf1-x134.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.3 (-)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

We really want string< to be consistent with string=3D and itself since =
this is fundamental for string ordering in searching and sorting =
applications.
This means that for any pair of strings A and B, we should either have =
A<B, B<A or A=3DB.

Unfortunately:

  (let* ((a "=C3=BC")
         (b "\xfc"))
    (list (string=3D a b)
          (string< a b)
          (string< b a)))
=3D> (nil nil nil)

because string< considers the unibyte raw byte 0xFC and the multibyte =
char U+00FC to be the same, but string=3D thinks they are different.
We also distinguish raw bytes by multibyte-ness:

  (let* ((u "\x80")
         (m (string-to-multibyte u)))
    (list (string=3D u m)
          (string< u m)
          (string< m u)))
=3D> (nil t nil)

but this is a minor annoyance that we can live with: we strongly want =
string=3D to remain consistent with `equal` for strings.
So, what can be done? The current string< implementation uses the =
character order

 ASCII < ub raw 80..FF =3D mb U+0080..U+00FF < U+0100..10FFFF < mb raw =
80..FF

in conflict with string=3D which unifies unibyte and multibyte ASCII but =
not raw bytes and Latin-1.
It suggests the following alternative collation orders:

A. ASCII < ub raw 80..FF < mb U+0080..10FFFF < mb raw 80..FF

which puts all non-ASCII multibyte chars after unibyte.

B. ASCII < ub raw 80..FF < mb raw 80..FF < mb U+0080..10FFFF

which inserts multibyte raw bytes after the unibyte ones, permitting any =
ub-ub and mb-mb comparisons to be made using memcmp, and a slow decoding =
loop only required for unibyte against non-ASCII multibyte strings.

C. ASCII < mb U+0080..10FFFF < mb raw 80..FF < ub raw 80..FF

which instead moves unibyte raw bytes to after the multibyte raw range. =
This has the same memcmp benefit as alternative B, but may be slightly =
faster for ub-mb comparisons since only unibyte 80..FF need to be =
remapped.

Any particular preference? Otherwise, I'll go with B or C, depending on =
what the resulting code looks like.





Acknowledgement sent to Mattias Engdegård <mattias.engdegard@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs@HIDDEN. Full text available.
Report forwarded to bug-gnu-emacs@HIDDEN:
bug#58168; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Thu, 6 Oct 2022 12:45:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.