X-Loop: help-debbugs@HIDDEN Subject: bug#22108: diff wrapper script for very large files, low memory Resent-From: Taco van Dijk <taco@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-diffutils@HIDDEN Resent-Date: Mon, 07 Dec 2015 16:17:02 +0000 Resent-Message-ID: <handler.22108.B.144950499124887 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 22108 X-GNU-PR-Package: diffutils X-GNU-PR-Keywords: To: 22108 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-diffutils@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.144950499124887 (code B ref -1); Mon, 07 Dec 2015 16:17:02 +0000 Received: (at submit) by debbugs.gnu.org; 7 Dec 2015 16:16:31 +0000 Received: from localhost ([127.0.0.1]:41868 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1a5ySj-0006TJ-RS for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 11:16:30 -0500 Received: from eggs.gnu.org ([208.118.235.92]:33389) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <taco@HIDDEN>) id 1a5uEt-00076l-8H for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 06:46:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <taco@HIDDEN>) id 1a5uEr-0007ir-UF for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 06:45:54 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:54345) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <taco@HIDDEN>) id 1a5uEr-0007in-RA for submit <at> debbugs.gnu.org; Mon, 07 Dec 2015 06:45:53 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43395) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <taco@HIDDEN>) id 1a5uEq-0007o2-Az for bug-diffutils@HIDDEN; Mon, 07 Dec 2015 06:45:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <taco@HIDDEN>) id 1a5uEl-0007iW-5f for bug-diffutils@HIDDEN; Mon, 07 Dec 2015 06:45:52 -0500 Received: from mx.waag.org ([195.169.149.61]:47158 helo=zimbra.waag.org) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <taco@HIDDEN>) id 1a5uEk-0007fZ-Vh for bug-diffutils@HIDDEN; Mon, 07 Dec 2015 06:45:47 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.waag.org (Postfix) with ESMTP id DBE212A60197 for <bug-diffutils@HIDDEN>; Mon, 7 Dec 2015 12:45:51 +0100 (CET) Received: from zimbra.waag.org ([127.0.0.1]) by localhost (zimbra.waag.org [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id VDypINDgLNGj for <bug-diffutils@HIDDEN>; Mon, 7 Dec 2015 12:45:51 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zimbra.waag.org (Postfix) with ESMTP id 1B1012A60199 for <bug-diffutils@HIDDEN>; Mon, 7 Dec 2015 12:45:51 +0100 (CET) X-Virus-Scanned: amavisd-new at zimbra.waag.org Received: from zimbra.waag.org ([127.0.0.1]) by localhost (zimbra.waag.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id ZhFojm8wnyY2 for <bug-diffutils@HIDDEN>; Mon, 7 Dec 2015 12:45:50 +0100 (CET) Received: from zimbra.waag.org (zimbra.waag.org [195.169.149.61]) by zimbra.waag.org (Postfix) with ESMTP id 9612A2A60197 for <bug-diffutils@HIDDEN>; Mon, 7 Dec 2015 12:45:50 +0100 (CET) Date: Mon, 7 Dec 2015 12:45:50 +0100 (CET) From: Taco van Dijk <taco@HIDDEN> Message-ID: <1787778417.63622490.1449488750054.JavaMail.zimbra@HIDDEN> In-Reply-To: <1613289466.63616406.1449488225960.JavaMail.zimbra@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [195.169.149.2] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF42 (Mac)/8.0.9_GA_6191) Thread-Topic: diff wrapper script for very large files, low memory Thread-Index: MJqCFyiIElu9Esv/E1GQK7OUVzf34w== X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Mailman-Approved-At: Mon, 07 Dec 2015 11:16:28 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -4.0 (----) Hi, For our current project we faced the following problem; When trying to compare two large files (2* 4+ Gb) exceeding the RAM of the machine, the machine would become unresponsive. To solve this problem we have found a solution that might be worthwhile sharing, based around xxhash. For anyone interested, you can find it here. https://github.com/waagsociety/hashed-diff Kind regards, Taco van Dijk & Lodewijk Loos Waag Society -- PGP: 82EDF574
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.503 (Entity 5.503) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Taco van Dijk <taco@HIDDEN> Subject: bug#22108: Acknowledgement (diff wrapper script for very large files, low memory) Message-ID: <handler.22108.B.144950499124887.ack <at> debbugs.gnu.org> References: <1787778417.63622490.1449488750054.JavaMail.zimbra@HIDDEN> X-Gnu-PR-Message: ack 22108 X-Gnu-PR-Package: diffutils Reply-To: 22108 <at> debbugs.gnu.org Date: Mon, 07 Dec 2015 16:17:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-diffutils@HIDDEN If you wish to submit further information on this problem, please send it to 22108 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 22108: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D22108 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN Subject: bug#22108: diff wrapper script for very large files, low memory References: <1787778417.63622490.1449488750054.JavaMail.zimbra@HIDDEN> In-Reply-To: <1787778417.63622490.1449488750054.JavaMail.zimbra@HIDDEN> Resent-From: Jim Meyering <jim@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-diffutils@HIDDEN Resent-Date: Mon, 02 May 2016 02:01:02 +0000 Resent-Message-ID: <handler.22108.B22108.146215443313783 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 22108 X-GNU-PR-Package: diffutils X-GNU-PR-Keywords: To: 22108 <at> debbugs.gnu.org, taco@HIDDEN Received: via spool by 22108-submit <at> debbugs.gnu.org id=B22108.146215443313783 (code B ref 22108); Mon, 02 May 2016 02:01:02 +0000 Received: (at 22108) by debbugs.gnu.org; 2 May 2016 02:00:33 +0000 Received: from localhost ([127.0.0.1]:32908 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ax3A0-0003aF-Vc for submit <at> debbugs.gnu.org; Sun, 01 May 2016 22:00:33 -0400 Received: from mail-oi0-f46.google.com ([209.85.218.46]:33796) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <meyering@HIDDEN>) id 1ax39z-0003a2-M7 for 22108 <at> debbugs.gnu.org; Sun, 01 May 2016 22:00:31 -0400 Received: by mail-oi0-f46.google.com with SMTP id k142so175551083oib.1 for <22108 <at> debbugs.gnu.org>; Sun, 01 May 2016 19:00:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:message-id:subject:to; bh=bVAYSsL3EdSpq9eXQegBlXiB+9i5Xx8kjO+7OwgLaqQ=; b=lVfI1AofeIcGfUPRdEFsQeRlcyrMhLcZnA0qYpBP9pLfbs5o8j7fw7O517WBJv6+t+ HCPYRa9WzWWohlG9L2HVpkDk8rumPq7MR6JlhlxoLSnHJ11uSL+9D0nzftEvjOSa24mV eOyxuGW+FucXdzWgiZ2fvZUCt5/TivnMo7tu4p1b3TQWE14NXm1wkguRv1+yJD3g+jKr ayz7NxpUFAx4PF3QgtQRKr0ifD10SY2j1ybLzN9Ool3GGREvAdYYbQD/B9gXjSx7PcWw oATG3ShnSgM6hMVee0yG5thKOON25NrBWclO8K2ZeW/eUOM9HxQLAo0SQ+TvEwbBnDuO aVDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:from:date:message-id:subject :to; bh=bVAYSsL3EdSpq9eXQegBlXiB+9i5Xx8kjO+7OwgLaqQ=; b=f7zBlmDYV79OkBLYTXX5FBCHx/oAIUrB/5G1ekxvK0GsnaLflPLTE8Nouf6BmeWxe+ W3Nva/C1vJfQdWWcB4Exx64W3gDM6gcYZRwa9qMKpDCDWwXiWgw17476v9lLaRXc051Q AFkA3Gjp+9Y+slz4r0+6zm1Y/ZfInwx98W1uDI9dqkxbfjiX/LYwcBnKTwX9ow5qDbNZ +DjcmaRNOYtTYuHciXnnajwdf4U9IAxFZOqChWHniqXt6tRW9qYmlayJ8/oLItfPT3Fh Lo70DwRjeu4SzVbkQC9k0f2C0o7zt9QejTTr1vBTNglF4etPHdLB/Znai+6baejPlg9X nszA== X-Gm-Message-State: AOPr4FXEI/rv1RcEZh4wUJBnEkvc163CQnZt4d0uyATW+FoHlXKzMzx83htTineoQ3HUmEjcuJvZNbtuh6psdA== X-Received: by 10.157.1.120 with SMTP id 111mr12975022otu.172.1462154426077; Sun, 01 May 2016 19:00:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.175.193 with HTTP; Sun, 1 May 2016 19:00:06 -0700 (PDT) From: Jim Meyering <jim@HIDDEN> Date: Sun, 1 May 2016 19:00:06 -0700 X-Google-Sender-Auth: _tb1AnzEmaOMpTRba9IwMK47lIc Message-ID: <CA+8g5KGN0e8npXT7nJDWMJmy_kYYoOKKYcFawDTNgQ7xiBCrdg@HIDDEN> Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.5 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.5 (/) tags 22108 wishlist done Thanks for the suggestion and pointer. FYI, your problem is very similar to that described at http://bugs.gnu.org/21665 I'm marking this auto-created issue as "wishlist". A combination of this approach and using mmap may be profitable when input files are too large for available RAM.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.