X-Loop: help-debbugs@HIDDEN Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers Resent-From: Ihor Radchenko <yantar92@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-gnu-emacs@HIDDEN Resent-Date: Sun, 23 Apr 2023 19:40:01 +0000 Resent-Message-ID: <handler.63040.B.168227874729375 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 63040 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: 63040 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-gnu-emacs@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.168227874729375 (code B ref -1); Sun, 23 Apr 2023 19:40:01 +0000 Received: (at submit) by debbugs.gnu.org; 23 Apr 2023 19:39:07 +0000 Received: from localhost ([127.0.0.1]:47009 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pqfYA-0007di-Vw for submit <at> debbugs.gnu.org; Sun, 23 Apr 2023 15:39:07 -0400 Received: from lists.gnu.org ([209.51.188.17]:43514) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <yantar92@HIDDEN>) id 1pqfY8-0007dZ-TI for submit <at> debbugs.gnu.org; Sun, 23 Apr 2023 15:39:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <yantar92@HIDDEN>) id 1pqfY8-0005du-Ir for bug-gnu-emacs@HIDDEN; Sun, 23 Apr 2023 15:39:04 -0400 Received: from mout02.posteo.de ([185.67.36.66]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <yantar92@HIDDEN>) id 1pqfY4-0001lE-GG for bug-gnu-emacs@HIDDEN; Sun, 23 Apr 2023 15:39:03 -0400 Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id A4540240170 for <bug-gnu-emacs@HIDDEN>; Sun, 23 Apr 2023 21:38:57 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1682278737; bh=WDvJvoFOGMe+96eBGGhd7uvnO1J4m45uPwnG/vE/I0U=; h=From:To:Subject:Date:From; b=HbeythQTKRxu18ay9A6R+d5uUZ7V0PSL/Dn3Mv/C/C/J3nklrlKNt7JArjRp9VKed XP4WZwEkXrdn7R/gf+ZGk1XHNRA9TfLuP/3axKyFazSy3A880VPp7ZCNsJhqH6s3i+ OegGEBugvheoDmcIQQ+xF2PCvUkfNRHv9g7UCSKSKgciRWFKZ9lJLwI+laaboCOr7q s/cFmYns1C9vTzVWDD8HUzb2SPDqSLLixty5PF8h5EAzosJkRQcwOw6zE3bvxPpGAt Q099LGJz6ufPlp+tJ1z7xzD5btUgkllMg0y3yhfRbV8W8uVPqy+K2t5DUHmQQsvGN5 jJlUOoRDb0JJw== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4Q4JV139Lyz9rxH for <bug-gnu-emacs@HIDDEN>; Sun, 23 Apr 2023 21:38:49 +0200 (CEST) From: Ihor Radchenko <yantar92@HIDDEN> Date: Sun, 23 Apr 2023 19:41:40 +0000 Message-ID: <878reiwfm3.fsf@localhost> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=185.67.36.66; envelope-from=yantar92@HIDDEN; helo=mout02.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.3 (--) --=-=-= Content-Type: text/plain Hi, When investigating `re-search-forward' performance in https://debbugs.gnu.org/cgi/bugreport.cgi?bug=58558 (bug#58558), I noticed that buf_bytepos_to_charpos is taking most of the CPU time, according to perf stats. This was partially caused by `parse-sexp-lookup-properties', but even after working around the text property issue, buf_bytepos_to_charpos still shows up on top of the perf profile. Since one of the apparent bottlenecks in buf_bytepos_to_charpos is for (tail = BUF_MARKERS (b); tail; tail = tail->next) which obviously scales with the number of markers in buffer, I decided to add a cut-off parameter, as in the attached patch (number 50 has no particular motivation underneath). Surprisingly, this simple change reduced my Org agenda generation times from 20 seconds down to 3-4 seconds! I am sure that my dumb approach is not the best way to improve the performance, but this place in buf_bytepos_to_charpos is clearly something that can be optimized. --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-src-marker.c-buf_bytepos_to_charpos-Limit-marker-sea.patch From a6ff6268bdc42a7dfedc6729d4232a2ae149da56 Mon Sep 17 00:00:00 2001 Message-Id: <a6ff6268bdc42a7dfedc6729d4232a2ae149da56.1682278830.git.yantar92@HIDDEN> From: Ihor Radchenko <yantar92@HIDDEN> Date: Sun, 23 Apr 2023 21:31:46 +0200 Subject: [PATCH] * src/marker.c (buf_bytepos_to_charpos): Limit marker search Limit searching across buffer markers to first 50 markers and thus avoid performance scaling with the number of markers. I got 5x `re-search-forward' speed improvement in my setup with this dumb change. --- src/marker.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/marker.c b/src/marker.c index e42c49a5434..008a76c49e6 100644 --- a/src/marker.c +++ b/src/marker.c @@ -348,8 +348,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) if (b == cached_buffer && BUF_MODIFF (b) == cached_modiff) CONSIDER (cached_bytepos, cached_charpos); - for (tail = BUF_MARKERS (b); tail; tail = tail->next) + int i = 0; + for (tail = BUF_MARKERS (b); tail && i < 50; tail = tail->next) { + i++; CONSIDER (tail->bytepos, tail->charpos); /* If we are down to a range of 50 chars, -- 2.40.0 --=-=-= Content-Type: text/plain In GNU Emacs 30.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.37, cairo version 1.17.8) of 2023-04-23 built on localhost Repository revision: ca875e3947e29d222554a05583068c49a56ed8ca Repository branch: master Windowing system distributor 'The X.Org Foundation', version 11.0.12101008 System Description: Gentoo Linux Configured using: 'configure --with-native-compilation' -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> --=-=-=--
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Ihor Radchenko <yantar92@HIDDEN> Subject: bug#63040: Acknowledgement (30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers) Message-ID: <handler.63040.B.168227874729375.ack <at> debbugs.gnu.org> References: <878reiwfm3.fsf@localhost> X-Gnu-PR-Message: ack 63040 X-Gnu-PR-Package: emacs Reply-To: 63040 <at> debbugs.gnu.org Date: Sun, 23 Apr 2023 19:40:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-gnu-emacs@HIDDEN If you wish to submit further information on this problem, please send it to 63040 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 63040: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D63040 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers Resent-From: Eli Zaretskii <eliz@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-gnu-emacs@HIDDEN Resent-Date: Mon, 24 Apr 2023 02:25:01 +0000 Resent-Message-ID: <handler.63040.B63040.16823030548622 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 63040 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Ihor Radchenko <yantar92@HIDDEN> Cc: 63040 <at> debbugs.gnu.org Received: via spool by 63040-submit <at> debbugs.gnu.org id=B63040.16823030548622 (code B ref 63040); Mon, 24 Apr 2023 02:25:01 +0000 Received: (at 63040) by debbugs.gnu.org; 24 Apr 2023 02:24:14 +0000 Received: from localhost ([127.0.0.1]:47243 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pqlsD-0002Ez-MG for submit <at> debbugs.gnu.org; Sun, 23 Apr 2023 22:24:13 -0400 Received: from eggs.gnu.org ([209.51.188.92]:52892) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pqlsA-0002Ei-K7 for 63040 <at> debbugs.gnu.org; Sun, 23 Apr 2023 22:24:12 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pqls4-0008MO-6O; Sun, 23 Apr 2023 22:24:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=uCMuouM3gMsSktLW5BgIsVMewcAuvrz0IBEpWuc6mr0=; b=iQ1a7NXkY3UM 8JDMmU80R2QqP5uQOAL5//ZUJte88hBx8hAAe3728OmyDD7CbZnMhQuckRddrSIHSYJ8zS8B9o+Bb CFxA0xNIWo+U1IRtPkXQocFwGoqUQwSW8lHbp8s+Dpb5thu7DIUdaruok3XT5ETIz99d5HIBA52AU ebIRrXZyM6HejzgKq2v7DtCuiqHgZHlVAOM5kaGyQe0pj8sUeW01E292dXOCAmd1BOFtZ1/hkfmMC xKtY7x1g3O6zqu/fHKqH97B+LNvI21b7e5RJfVuhcUs2nZo5AnDLKcepx6tcI2ROP3ODyK2QtaM6P s0UXkFCccn4Tqb2e5mdqbw==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pqls3-0006vg-HO; Sun, 23 Apr 2023 22:24:03 -0400 Date: Mon, 24 Apr 2023 05:24:26 +0300 Message-Id: <83wn22xbj9.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> In-Reply-To: <878reiwfm3.fsf@localhost> (message from Ihor Radchenko on Sun, 23 Apr 2023 19:41:40 +0000) References: <878reiwfm3.fsf@localhost> X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > From: Ihor Radchenko <yantar92@HIDDEN> > Date: Sun, 23 Apr 2023 19:41:40 +0000 > > When investigating `re-search-forward' performance in > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=58558 (bug#58558), I > noticed that buf_bytepos_to_charpos is taking most of the CPU time, > according to perf stats. > > This was partially caused by `parse-sexp-lookup-properties', but even > after working around the text property issue, buf_bytepos_to_charpos > still shows up on top of the perf profile. > > Since one of the apparent bottlenecks in buf_bytepos_to_charpos is > > for (tail = BUF_MARKERS (b); tail; tail = tail->next) > > which obviously scales with the number of markers in buffer, I decided > to add a cut-off parameter, as in the attached patch (number 50 has no > particular motivation underneath). > > Surprisingly, this simple change reduced my Org agenda generation times > from 20 seconds down to 3-4 seconds! > > I am sure that my dumb approach is not the best way to improve the > performance, but this place in buf_bytepos_to_charpos is clearly > something that can be optimized. Interesting. Would it be possible to show the effect of different values of the cut-off on the performance, so we could decide which value to use? Thanks.
X-Loop: help-debbugs@HIDDEN Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers Resent-From: Ihor Radchenko <yantar92@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-gnu-emacs@HIDDEN Resent-Date: Mon, 24 Apr 2023 06:34:02 +0000 Resent-Message-ID: <handler.63040.B63040.16823180044340 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 63040 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Eli Zaretskii <eliz@HIDDEN> Cc: 63040 <at> debbugs.gnu.org Received: via spool by 63040-submit <at> debbugs.gnu.org id=B63040.16823180044340 (code B ref 63040); Mon, 24 Apr 2023 06:34:02 +0000 Received: (at 63040) by debbugs.gnu.org; 24 Apr 2023 06:33:24 +0000 Received: from localhost ([127.0.0.1]:47341 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pqplM-00017w-A1 for submit <at> debbugs.gnu.org; Mon, 24 Apr 2023 02:33:24 -0400 Received: from mout01.posteo.de ([185.67.36.65]:35165) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <yantar92@HIDDEN>) id 1pqplK-00017g-Cq for 63040 <at> debbugs.gnu.org; Mon, 24 Apr 2023 02:33:23 -0400 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 28C46240139 for <63040 <at> debbugs.gnu.org>; Mon, 24 Apr 2023 08:33:16 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1682317996; bh=+FZ8PE6bjUdMApN9r/BX9qrXrQO9ZmGzwVb1V9lRiVo=; h=From:To:Cc:Subject:Date:From; b=VpVnP32kbeH4PQK/NYoDMA4VAbx3EKTovrHiXMHOLprySXVx3KMKkBeRIpbA+f7ud O3vqOfRtqHcm5S1pNseuVZgsEjNZJOaFWZVTq1vYyi4ITV46FXK9HEoB8BDZ6Utz1X yB0Hx6FP1FdtJL1QEYnmlJ593F0cPmKgjsVB8m7+wr0q47BimD6/o4qAdQ2mc0XRcJ 3BwyG8kb8sXMaBZgDXoQS9VjtAEkaOGhOL3xQJ8rRcahE0hAY8zvZWSmS/Yt+0QZkq WxKZ9FAId5MQcwPwA8WDxxRH0gT2wx6LqbPSWIXpsFLsKhON4FQFXHKv4wjlpnOBFs 5Rmyyk8DgLOow== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4Q4b175Htnz6tyb; Mon, 24 Apr 2023 08:33:15 +0200 (CEST) From: Ihor Radchenko <yantar92@HIDDEN> In-Reply-To: <83wn22xbj9.fsf@HIDDEN> References: <878reiwfm3.fsf@localhost> <83wn22xbj9.fsf@HIDDEN> Date: Mon, 24 Apr 2023 06:36:07 +0000 Message-ID: <87bkjdzt0o.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Eli Zaretskii <eliz@HIDDEN> writes: >> I am sure that my dumb approach is not the best way to improve the >> performance, but this place in buf_bytepos_to_charpos is clearly >> something that can be optimized. > > Interesting. Would it be possible to show the effect of different > values of the cut-off on the performance, so we could decide which > value to use? I can do such test, but I do not think that playing with cut off is the best approach here. The full code in question is below and there is already existing condition to cut the marker loop early based on the distance from best_above to the requested bytepos. So, another approach could be playing with BYTECHAR_DISTANCE_INCREMENT. Now, it is clearly not efficient enough for my large file. (Note that I expressed similar idea in another discussion and was pointed exactly to this existing condition) Further, the later code creates markers to cache recent results and cutting too early may waste this cache. Another idea could be moving the cache markers into a separate array, so that we can examine them without mixing with all other buffer markers. I am not sure if it is feasible though. int i = 0; for (tail = BUF_MARKERS (b); tail && i < 50; tail = tail->next) { i++; CONSIDER (tail->bytepos, tail->charpos); /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ if (best_above - bytepos < distance || bytepos - best_below < distance) break; else distance += BYTECHAR_DISTANCE_INCREMENT; } /* We get here if we did not exactly hit one of the known places. We have one known above and one known below. Scan, counting characters, from whichever one is closer. */ if (bytepos - best_below_byte < best_above_byte - bytepos) { ... /* If this position is quite far from the nearest known position, cache the correspondence by creating a marker here. It will last until the next GC. But don't do it if BUF_MARKERS is nil; that is a signal from Fset_buffer_multibyte. */ if (record && BUF_MARKERS (b)) build_marker (b, best_below, best_below_byte); ... return best_below; } else { ... if (record && BUF_MARKERS (b)) build_marker (b, best_above, best_above_byte); ... return best_above; } } -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>
X-Loop: help-debbugs@HIDDEN Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers Resent-From: Eli Zaretskii <eliz@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-gnu-emacs@HIDDEN Resent-Date: Mon, 24 Apr 2023 11:03:01 +0000 Resent-Message-ID: <handler.63040.B63040.168233416120942 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 63040 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Ihor Radchenko <yantar92@HIDDEN> Cc: 63040 <at> debbugs.gnu.org Received: via spool by 63040-submit <at> debbugs.gnu.org id=B63040.168233416120942 (code B ref 63040); Mon, 24 Apr 2023 11:03:01 +0000 Received: (at 63040) by debbugs.gnu.org; 24 Apr 2023 11:02:41 +0000 Received: from localhost ([127.0.0.1]:47666 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pqtxw-0005Rh-V1 for submit <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:02:41 -0400 Received: from eggs.gnu.org ([209.51.188.92]:51598) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pqtxi-0005RD-J6 for 63040 <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:02:39 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pqtxc-0000Ag-41; Mon, 24 Apr 2023 07:02:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=QzwZFK3AB0F6/ZbByAoPBVJSfCHTEW+7PrN0ySIU40k=; b=KRy/vYiIzHt+ atpBNTF3MZ42ZASVtlm21X4lRPRGk71rSZvr9kNVOmWTJXPcm3fpvcSQCGRDRO61DL6pllGbpyEt/ syA4vSrMxigqrSQqlep8MUsqZqrQ7uFzZW3wrjz+fo9fCOkEm3IkE/zAZq+CRCkU+kSrlUToI7izB okug3DkSpW7WFxqzanfJBZY19AH62He0+fmOp8Xx3xVOoRBmTIpxmTM2fDT+NM6sHDXOWIvx6Us3D OQz2DIT6RjcNoo04bVkPrCrmp0s2+13YLSOpVbdF5PzqJ2L6c8AJxpKZVPOcKPnzMl4LJeWZDp8/T gfrqvUrRV7HXpPPQ2OwkGA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pqtxa-000103-Tp; Mon, 24 Apr 2023 07:02:19 -0400 Date: Mon, 24 Apr 2023 14:02:42 +0300 Message-Id: <83sfcpy23x.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> In-Reply-To: <87bkjdzt0o.fsf@localhost> (message from Ihor Radchenko on Mon, 24 Apr 2023 06:36:07 +0000) References: <878reiwfm3.fsf@localhost> <83wn22xbj9.fsf@HIDDEN> <87bkjdzt0o.fsf@localhost> X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > From: Ihor Radchenko <yantar92@HIDDEN> > Cc: 63040 <at> debbugs.gnu.org > Date: Mon, 24 Apr 2023 06:36:07 +0000 > > > Interesting. Would it be possible to show the effect of different > > values of the cut-off on the performance, so we could decide which > > value to use? > > I can do such test, but I do not think that playing with cut off is the > best approach here. > > The full code in question is below and there is already existing > condition to cut the marker loop early based on the distance from > best_above to the requested bytepos. So, another approach could be > playing with BYTECHAR_DISTANCE_INCREMENT. Yes, that would be an even better idea, IMO. > Now, it is clearly not efficient enough for my large file. Why do you say that? Did you try something and the results were unsatisfactory? And what is not efficient enough -- the cutoff based on the number of markers tested or based on the distance? > Further, the later code creates markers to cache recent results and > cutting too early may waste this cache. And the technique that you tried doesn't waste the cache? > Another idea could be moving the cache markers into a separate > array, so that we can examine them without mixing with all other > buffer markers. Why would that separation be useful? Thanks.
X-Loop: help-debbugs@HIDDEN Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers Resent-From: Eli Zaretskii <eliz@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-gnu-emacs@HIDDEN Resent-Date: Mon, 24 Apr 2023 11:04:02 +0000 Resent-Message-ID: <handler.63040.B63040.168233419521013 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 63040 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Ihor Radchenko <yantar92@HIDDEN>, Stefan Monnier <monnier@HIDDEN> Cc: 63040 <at> debbugs.gnu.org Received: via spool by 63040-submit <at> debbugs.gnu.org id=B63040.168233419521013 (code B ref 63040); Mon, 24 Apr 2023 11:04:02 +0000 Received: (at 63040) by debbugs.gnu.org; 24 Apr 2023 11:03:15 +0000 Received: from localhost ([127.0.0.1]:47671 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pqtyU-0005Sr-Fv for submit <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:03:15 -0400 Received: from eggs.gnu.org ([209.51.188.92]:49856) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1pqtyF-0005S2-FI for 63040 <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:03:14 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pqty9-0000Fx-Bm; Mon, 24 Apr 2023 07:02:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=bC5ZIbE/4+//Gw9dWOGAe+LhO5tVMRDoscD5hWxmj9I=; b=hjud5E/49lSK XsHGcVN0declldmqy95XLNjwBrK/0cUMe2u5T+5krpMNfPCO78cJ8vGTkDRJHOUqhbjogTDvcPTAi 2nEHy6xAM/As40c+JAgMqeonWsReXy/Mtu+bZuyEhzZJutdnkfAMQpkWHGfaHvyHpFLBiKu2eyUba 6alKUDY96lxrqS/N+wzdcA+pMmRhN7+o841nZGVWCN0dngf6upzLPmk+HHMm3WwKr4cQoK+6VeN/B OZr9DfHDvAqTE8CarKyecVLwL3qeBPU4urHiyfPp6PRIhbLUaINqG3BZpymrmPyjiIz4sH3Weh3Dv xkarHgFtu+zwsbVGXAWB9A==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1pqty8-00039a-Ll; Mon, 24 Apr 2023 07:02:52 -0400 Date: Mon, 24 Apr 2023 14:03:17 +0300 Message-Id: <83r0s9y22y.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> In-Reply-To: <87bkjdzt0o.fsf@localhost> (message from Ihor Radchenko on Mon, 24 Apr 2023 06:36:07 +0000) References: <878reiwfm3.fsf@localhost> <83wn22xbj9.fsf@HIDDEN> <87bkjdzt0o.fsf@localhost> X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) [Resending with Stefan added.] > From: Ihor Radchenko <yantar92@HIDDEN> > Cc: 63040 <at> debbugs.gnu.org > Date: Mon, 24 Apr 2023 06:36:07 +0000 > > > Interesting. Would it be possible to show the effect of different > > values of the cut-off on the performance, so we could decide which > > value to use? > > I can do such test, but I do not think that playing with cut off is the > best approach here. > > The full code in question is below and there is already existing > condition to cut the marker loop early based on the distance from > best_above to the requested bytepos. So, another approach could be > playing with BYTECHAR_DISTANCE_INCREMENT. Yes, that would be an even better idea, IMO. > Now, it is clearly not efficient enough for my large file. Why do you say that? Did you try something and the results were unsatisfactory? And what is not efficient enough -- the cutoff based on the number of markers tested or based on the distance? > Further, the later code creates markers to cache recent results and > cutting too early may waste this cache. And the technique that you tried doesn't waste the cache? > Another idea could be moving the cache markers into a separate > array, so that we can examine them without mixing with all other > buffer markers. Why would that separation be useful? Thanks.
X-Loop: help-debbugs@HIDDEN Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers Resent-From: Ihor Radchenko <yantar92@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-gnu-emacs@HIDDEN Resent-Date: Mon, 24 Apr 2023 11:16:02 +0000 Resent-Message-ID: <handler.63040.B63040.168233492222331 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 63040 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Eli Zaretskii <eliz@HIDDEN> Cc: 63040 <at> debbugs.gnu.org, Stefan Monnier <monnier@HIDDEN> Received: via spool by 63040-submit <at> debbugs.gnu.org id=B63040.168233492222331 (code B ref 63040); Mon, 24 Apr 2023 11:16:02 +0000 Received: (at 63040) by debbugs.gnu.org; 24 Apr 2023 11:15:22 +0000 Received: from localhost ([127.0.0.1]:47682 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pquAE-0005o6-20 for submit <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:15:22 -0400 Received: from mout02.posteo.de ([185.67.36.66]:35967) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <yantar92@HIDDEN>) id 1pquAC-0005nt-0C for 63040 <at> debbugs.gnu.org; Mon, 24 Apr 2023 07:15:20 -0400 Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id C9F38240228 for <63040 <at> debbugs.gnu.org>; Mon, 24 Apr 2023 13:15:13 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1682334913; bh=dgctnNN88xqCvIDk1fUULel8YCKpMB48bXRnTtWQH2g=; h=From:To:Cc:Subject:Date:From; b=NcFmOJgiiJ5DRg2YWiQydOqJ/ZFx8SyEliFcT9Ijcv7nqg9pppZWUVSqkqvel0n70 zd0AsZUebzJE0h04pOay+m0vEjrSNnBXnU73ILpUWqLUb4ipuYrIK18D0q1NAVWw/r 5yfW4WNzFtD8eK37v61u581zLLccqnl8mhDpSsBKUq3if6wLGJyJuggdc1iAEuSYiV AYK9/rwjJIk/e5HMREvCbOlr1XZE/aUAPyWMSVZ7Elo8YADkyUtps5ENs8HvoIlZXc szXYBhdyHoxkJ1BD9IjXPBUmqo8ref8Wasy0B3K0BsXxmXcWWWh5EZ865fxJVqCQeP 34uINqyB4SUBQ== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4Q4jGS51Yqz9rxT; Mon, 24 Apr 2023 13:15:12 +0200 (CEST) From: Ihor Radchenko <yantar92@HIDDEN> In-Reply-To: <83r0s9y22y.fsf@HIDDEN> References: <878reiwfm3.fsf@localhost> <83wn22xbj9.fsf@HIDDEN> <87bkjdzt0o.fsf@localhost> <83r0s9y22y.fsf@HIDDEN> Date: Mon, 24 Apr 2023 11:17:59 +0000 Message-ID: <87ildlttp4.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Eli Zaretskii <eliz@HIDDEN> writes: >> Now, it is clearly not efficient enough for my large file. > > Why do you say that? Did you try something and the results were > unsatisfactory? And what is not efficient enough -- the cutoff based > on the number of markers tested or based on the distance? Sorry for not being clear. I was referring to the existing code. "BYTECHAR_DISTANCE_INCREMENT" alone is clearly not efficient in my use case because introducing an addition 50 cut-off improved the performance significantly. Hence, there is some room for improvement in this area. >> Further, the later code creates markers to cache recent results and >> cutting too early may waste this cache. > > And the technique that you tried doesn't waste the cache? I was talking about my technique. It is wasting the cache, and it is the reason why I think that we should find a better approach; not the one I used in the patch. >> Another idea could be moving the cache markers into a separate >> array, so that we can examine them without mixing with all other >> buffer markers. > > Why would that separation be useful? Because the markers created by buf_bytepos_to_charpos are at least 5000 bytes apart. There is no such guarantee for other buffer markers. The while loops "while (best_below_byte < bytepos)" used as fallback (when no nearby marker is found) traverse the buffer char-by-char "("best_below_byte += buf_next_char_len (b, best_below_byte);" and should be strictly inferior compared to well-spaced marker list. However, when the marker list is not well-spaced, looping over all the buffer markers can be a waste. And it looks like I hit exactly such scenario in my setup. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.