GNU bug report logs - #69188
30.0.50; project-files + project-find-file is slow in large repositories

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Reported by: Spencer Baugh <sbaugh@HIDDEN>; merged with #69233; dated Sun, 18 Feb 2024 18:22:02 UTC; Maintainer for emacs is bug-gnu-emacs@HIDDEN.

Message received at 69188 <at> debbugs.gnu.org:


Received: (at 69188) by debbugs.gnu.org; 23 Feb 2024 21:55:29 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Feb 23 16:55:28 2024
Received: from localhost ([127.0.0.1]:46568 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1rddVv-00031y-7a
	for submit <at> debbugs.gnu.org; Fri, 23 Feb 2024 16:55:28 -0500
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:46893)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <sbaugh@HIDDEN>) id 1rddCF-0001xn-Ix
 for 69188 <at> debbugs.gnu.org; Fri, 23 Feb 2024 16:35:09 -0500
From: Spencer Baugh <sbaugh@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
Subject: Re: bug#69188: 30.0.50; project-files + project-find-file is slow
 in large repositories
In-Reply-To: <86bk8dr0g1.fsf@HIDDEN> (Eli Zaretskii's message of "Sun, 18 Feb
 2024 22:18:06 +0200")
References: <iera5o11gnh.fsf@HIDDEN>
 <86y1bhr47o.fsf@HIDDEN>
 <f95199e0-c585-4dea-beb1-305b9cac83f5@HIDDEN>
 <86frxpr1yl.fsf@HIDDEN>
 <391ea08d-9d52-4f03-a602-045b76ac862c@HIDDEN>
 <86bk8dr0g1.fsf@HIDDEN>
Date: Fri, 23 Feb 2024 16:34:38 -0500
Message-ID: <ierh6hy6f0x.fsf_-_@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1708724078;
 bh=MNNC0Dy0Vb0yb7oV+/PLirPqARrLZAfRQwmyHwtaogc=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=OjHINCkCHQaIeIgnWKgpCiTJVaDSoQsVl6bHhBOixXovdUqIeGkKofSdlwxe3Nlvn
 Npoj6UC1EakVGsjZ8n1pYRdTOYWTNPWxhBxzaBYbuxcpuN8SdSC10FLoNmuitooEve
 nlD1YG22jFN3gclokXw+0taZHYse+RVq1Ezfcw0Mu16kisZdw3fyvsgmyNPmEBDZ7U
 eY12CVKq4yQOP8VHIjy6feTNnN7TUSAqM6llin1wgKvqJjgjaeOk3K/kicrY39jvPM
 Yd/mQx2tSQyaJK9PjNm36DRV1qmVBuC0413DMLXwq1JAtE+pPMcIAb9Rg2iOq31WDB
 baM0lqemYLGPg==
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 69188
Cc: Dmitry Gutov <dmitry@HIDDEN>, 69233 <at> debbugs.gnu.org,
 69188 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Eli Zaretskii <eliz@HIDDEN> writes:
>> Date: Sun, 18 Feb 2024 22:11:43 +0200
>> Cc: sbaugh@HIDDEN, 69233 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dmitry@HIDDEN>
>> 
>> On 18/02/2024 21:45, Eli Zaretskii wrote:
>> >> Date: Sun, 18 Feb 2024 21:42:37 +0200
>> >> Cc:69233 <at> debbugs.gnu.org
>> >> From: Dmitry Gutov<dmitry@HIDDEN>
>> >>
>> >> On 18/02/2024 20:56, Eli Zaretskii wrote:
>> >>> This is a duplicate of another bug report you submitted not long ago.
>> >> Any reason I didn't receive the first one to my inbox?
>> > I don't have the foggiest, sorry.
>> 
>> It seems Spencer didn't get the confirmation email either, or he 
>> wouldn't resubmit.
>
> One can know if debbugs received a report via the Web interface.

Yes, it seems that all my email was backed up for a day or so, for
whatever reason.  Sorry for the noise.

(Or maybe I just think this is such an important bug that I submitted it
twice :) )




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#69188; Package emacs. Full text available.
Merged 69188 69233. Request was from Eli Zaretskii <eliz@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 18 Feb 2024 18:21:15 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Feb 18 13:21:15 2024
Received: from localhost ([127.0.0.1]:36915 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1rblms-0002dl-L2
	for submit <at> debbugs.gnu.org; Sun, 18 Feb 2024 13:21:15 -0500
Received: from lists.gnu.org ([209.51.188.17]:53956)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <sbaugh@HIDDEN>) id 1rblSM-0001mW-BG
 for submit <at> debbugs.gnu.org; Sun, 18 Feb 2024 13:00:03 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <sbaugh@HIDDEN>)
 id 1razKK-0003zA-HQ
 for bug-gnu-emacs@HIDDEN; Fri, 16 Feb 2024 09:36:32 -0500
Received: from mxout5.mail.janestreet.com ([64.215.233.18])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <sbaugh@HIDDEN>)
 id 1razKI-0003NB-QF
 for bug-gnu-emacs@HIDDEN; Fri, 16 Feb 2024 09:36:32 -0500
From: Spencer Baugh <sbaugh@HIDDEN>
To: bug-gnu-emacs@HIDDEN
Subject: 30.0.50; project-files + project-find-file is slow in large
 repositories
Date: Thu, 15 Feb 2024 17:55:46 -0500
Message-ID: <iera5o11gnh.fsf@HIDDEN>
X-Debbugs-Cc: 
MIME-Version: 1.0
Content-Type: text/plain
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1708094189;
 bh=t9/KDA79OABR2WeY/27WB0uFtBQh2wam0ho3GDDdr48=;
 h=From:To:Cc:Subject:Date;
 b=PHO9jNfIy1tUY4BqgxOAZBuusI9CFnpHQhoLD1j2E/7cLROmgyOvL/bWE4xJ7YFHV
 VbAxBcu7CKeR7CiJ13HBA5neswjjv9hZIljFAO1oBRoWDaqy6yzdXIRjyv1Gy3asWz
 h4U8TsMi5jkPmvMloZz/AnwkuehdWhubP5uA2J/B89UEzO0qyR6WFrWBvoXvFAaSvW
 /5wSFCNjaQGI3QCdMGJsava2hBc3fI6Jcv0Y9BTyq7324KBA7frIlQxVlb0zR8wjvy
 MesZPCoNI+6dSj4R5SOzO7sf5YbBmWsBhTMxxjeOTRQFAAlMqPVZkmY4PNilsJLQG0
 7N2mB7878WcYQ==
Received-SPF: pass client-ip=64.215.233.18; envelope-from=sbaugh@HIDDEN;
 helo=mxout5.mail.janestreet.com
X-Spam_score_int: -10
X-Spam_score: -1.1
X-Spam_bar: -
X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DATE_IN_PAST_12_24=1.049,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001,
 SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -3.2 (---)
X-Debbugs-Envelope-To: submit
Cc: Dmitry Gutov <dmitry@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.2 (----)


(project-files (project-current)) takes around 1 second in Linux (80k
files) and 7 seconds in my larger (500k file) repository.

With this patch:
diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index c7c07c3d34c..037beaa835a 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -667,12 +667,15 @@
                                               (setq i (concat i "**"))))
                                         i)))
                                    extra-ignores)))))
-       (setq files
-             (mapcar
-              (lambda (file) (concat default-directory file))
-              (split-string
-               (apply #'vc-git--run-command-string nil "ls-files" args)
-               "\0" t)))
+       (with-temp-buffer
+         (let ((ok (apply #'vc-git--out-ok "ls-files" args))
+               (pt (point-min)))
+           (unless ok
+             (error "File listing failed: %s" (buffer-string)))
+           (goto-char pt)
+           (while (search-forward "\0" nil t)
+             (push (concat default-directory (buffer-substring-no-properties pt (1- (point)))) files)
+             (setq pt (point)))))
        (when (project--vc-merge-submodules-p default-directory)
          ;; Unfortunately, 'ls-files --recurse-submodules' conflicts with '-o'.
          (let* ((submodules (project--git-submodules))

project-files in Linux takes around .75 seconds.

If I further remove the (concat default-directory ...) around each file,
it speeds up to .5 seconds.

(Note that git ls-files itself takes only around 20 milliseconds)

My large repository (which uses Mercurial) has a custom project-files
which is basically:

(with-temp-buffer
  (unless (zerop (apply #'call-process "rhg" nil t nil "files"))
    (error "File listing failed: %s" (buffer-string)))
  (goto-char (point-min))
  (let ((pt (point))
        res)
    (while (search-forward "\n" nil t)
      (push (file-name-concat default-directory (buffer-substring-no-properties pt (1- (point)))) res)
      (setq pt (point)))
    res))

Likewise, removing the (concat default-directory ...) speeds my
project-files up from 7 seconds to 4.5 seconds.

This is especially silly because project-find-file then just removes
this default-directory again from all the files, which has yet more
overhead.

My proposal: Could we find a way to make the default-directory not
necessary for the files returned from project-files?

Perhaps project-files could be allowed to return relative file paths
which are relative to the project root.  Then in the common case where
all the files are within the project root, project-find-file would be
way faster.  Happy to implement this, if it makes sense.

Another optimization I've considered: We could run the process
asynchronously so project-files parsing can be parallel with the
process; but the process is usually very fast anyway, that's not most of
the overhead, so that won't be a big win.

However, that would make it easy for project-files as a whole to be
asynchronous.  Then that would allow project-find-file to start the
listing in the background, and then we'd write a completion table which
completes only over whatever files we've already read into Emacs.  I
think this would be a lot nicer for most use-cases, and I'd again be
happy to implement this.

Also happy to implement any other optimizations you think might make
sense.


In GNU Emacs 30.0.50 (build 37, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2024-02-13 built on
 igm-qws-u22796a
Repository revision: a24a2b1ceb12f11c9d345190fbf554f27c4ec186
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.9 (Green Obsidian)

Configured using:
 'configure -C --with-x-toolkit=lucid 'CFLAGS=-O0 -g3'
 --without-native-compilation --without-gif'

Configured features:
CAIRO DBUS FREETYPE GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LIBSELINUX LIBSYSTEMD LIBXML2 MODULES NOTIFY INOTIFY PDUMPER PNG RSVG
SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM
XINPUT2 XPM LUCID ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  minibuffer-regexp-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils rmc iso-transl tooltip cconv eldoc paren electric
uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/x-win x-win term/common-win x-dnd touch-screen tool-bar dnd fontset
image regexp-opt fringe tabulated-list replace newcomment text-mode
lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch
easymenu timer select scroll-bar mouse jit-lock font-lock syntax
font-core term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
theme-loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads dbusbind inotify
dynamic-setting system-font-setting font-render-setting cairo x-toolkit
xinput2 x multi-tty move-toolbar make-network-process emacs)

Memory information:
((conses 16 65052 9318) (symbols 48 9539 0) (strings 32 22452 1449)
 (string-bytes 1 659675) (vectors 16 9245)
 (vector-slots 8 111110 9295) (floats 8 40 17) (intervals 56 262 0)
 (buffers 976 10))




Acknowledgement sent to Spencer Baugh <sbaugh@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs@HIDDEN. Full text available.
Report forwarded to bug-gnu-emacs@HIDDEN:
bug#69188; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Fri, 23 Feb 2024 22:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.