X-Loop: help-debbugs@HIDDEN Subject: bug#30719: Progressively compressing piped input Resent-From: "Garreau\, Alexandre" <galex-713@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-gzip@HIDDEN Resent-Date: Mon, 05 Mar 2018 21:20:02 +0000 Resent-Message-ID: <handler.30719.B.152028476623090 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 30719 X-GNU-PR-Package: gzip X-GNU-PR-Keywords: To: 30719 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-gzip@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.152028476623090 (code B ref -1); Mon, 05 Mar 2018 21:20:02 +0000 Received: (at submit) by debbugs.gnu.org; 5 Mar 2018 21:19:26 +0000 Received: from localhost ([]:46323 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1esxW0-00060K-RM for submit <at> debbugs.gnu.org; Mon, 05 Mar 2018 16:19:25 -0500 Received: from eggs.gnu.org ([]:57897) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <galex-713@HIDDEN>) id 1esxVo-0005zf-UW for submit <at> debbugs.gnu.org; Mon, 05 Mar 2018 16:19:13 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <galex-713@HIDDEN>) id 1esxVi-0001Ec-De for submit <at> debbugs.gnu.org; Mon, 05 Mar 2018 16:19:07 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:41979) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <galex-713@HIDDEN>) id 1esxVi-0001EY-8u for submit <at> debbugs.gnu.org; Mon, 05 Mar 2018 16:19:06 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48811) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <galex-713@HIDDEN>) id 1esxVh-0003sX-04 for bug-gzip@HIDDEN; Mon, 05 Mar 2018 16:19:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <galex-713@HIDDEN>) id 1esxVd-0001Bo-Pm for bug-gzip@HIDDEN; Mon, 05 Mar 2018 16:19:04 -0500 Received: from [2a01:e34:ec07:c940:20f:feff:fe1d:bfc] (port=58405 helo=galex-713.eu) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <galex-713@HIDDEN>) id 1esxVc-00019o-V8 for bug-gzip@HIDDEN; Mon, 05 Mar 2018 16:19:01 -0500 Received: from PC713 (unknown []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: galex-713) by galex-713.eu (Postfix) with ESMTPSA id 1D13B15F5CF for <bug-gzip@HIDDEN>; Mon, 5 Mar 2018 22:18:56 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=galex-713.eu; s=dkim; t=1520284736; bh=wl6XlXxWWnJosGcMNs3OOgKlYd+FF6L+HDF/f2QOzK4=; h=From:To:Subject:Date:From; b=aaKZHPb4wNxMusK3nw7Si91CL1Atl4/wQFS1UcSunSt0Ntlqq6md89jz8/Uuwkp7l BxrsaA64omIM8YFjmcrVLVYXgqDsYH9INhxD/yFx2mSm8SImSsN7us8PM/qxfPmmpm yOOtasD83Fcx/gvGtTzkBuy4da7SBzdXcVG7V5v8= From: "Garreau\, Alexandre" <galex-713@HIDDEN> User-Agent: Gnus (5.13), GNU Emacs 25.1.1 (x86_64-pc-linux-gnu) X-GPG-FINGERPRINT: E109 9988 4197 D7CB B0BC 5C23 8DEB 24BA 867D 3F7F X-Accept-Language: fr, en, it, eo Date: Mon, 05 Mar 2018 22:18:53 +0100 Message-ID: <ve1y9f9vsiln.46t.xxuns.g6.gal@HIDDEN> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Mailman-Approved-At: Mon, 05 Mar 2018 16:19:23 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -4.0 (----) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, I have a script which has a logged very repetitive textual output (mostly output of ping and date). To minimize disk usage, I thought to pipe it to gzip -9. Then I realized the log, contrarily to before, remained empty, and recalled the GNU policy of =E2=80=9Creading all input a= nd only then outputting=E2=80=9D to maximize overall speed at the expense of t= he decreasingly expensive memory. Yet I want to run that script all the time and being able to dirtily killing it or just shutdown, without loosing all its output (nor am I sure anyway it is a good practice of keeping everything in ram until shutdown, considering I suppose gzip only keeps the compressed output in memory anyway, discarding the then useless input), and =E2=80=9Ctail -f=E2= =80=9D-ing the files it writes. I guess piping the whole output is the way to go to achieve optimal compression, since otherwise just gzipping each line/command output wouldn=E2=80=99t compress as much (since anyway the repetition occurs among= the lines, not inside them). Yet would there be a way to obtain this maximal compression, while having gzip outputing each time I stop giving it input (has I do every 30 seconds or so), without having to save the uncompressed file, nor recompressing the whole file several times? I mean, it seems to me a good thing to wait everything is compressed before to output, rather than outputing as soon as possible, but isn=E2=80= =99t there a way to trigger the output each time it has been processed and there=E2=80=99s no more input for a certain amount of time (that is ~30s)? Am I looking at something like this: --=-=-= Content-Type: text/x-sh Content-Disposition: inline; filename=sample.sh Content-Description: An example of what am I trying to do, where =?utf-8?Q?I=E2=80=99d?= like regular output #!/bin/bash while ping -c1 gnu.org ; do date --rfc-3339=seconds sleep 30 done | gzip -9 -f | tee sample.log | zcat --=-=-=--
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: "Garreau\, Alexandre" <galex-713@HIDDEN> Subject: bug#30719: Acknowledgement (Progressively compressing piped input) Message-ID: <handler.30719.B.152028476623090.ack <at> debbugs.gnu.org> References: <ve1y9f9vsiln.46t.xxuns.g6.gal@HIDDEN> X-Gnu-PR-Message: ack 30719 X-Gnu-PR-Package: gzip Reply-To: 30719 <at> debbugs.gnu.org Date: Mon, 05 Mar 2018 21:20:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-gzip@HIDDEN If you wish to submit further information on this problem, please send it to 30719 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 30719: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D30719 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN Subject: bug#30719: Progressively compressing piped input Resent-From: Mark Adler <madler@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-gzip@HIDDEN Resent-Date: Mon, 05 Mar 2018 22:55:02 +0000 Resent-Message-ID: <handler.30719.B30719.15202904946862 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 30719 X-GNU-PR-Package: gzip X-GNU-PR-Keywords: To: "Garreau, Alexandre" <galex-713@HIDDEN> Cc: 30719 <at> debbugs.gnu.org Received: via spool by 30719-submit <at> debbugs.gnu.org id=B30719.15202904946862 (code B ref 30719); Mon, 05 Mar 2018 22:55:02 +0000 Received: (at 30719) by debbugs.gnu.org; 5 Mar 2018 22:54:54 +0000 Received: from localhost ([]:46412 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1esz0P-0001mb-Jk for submit <at> debbugs.gnu.org; Mon, 05 Mar 2018 17:54:54 -0500 Received: from mail.alumni.caltech.edu ([]:5679) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <madler@HIDDEN>) id 1esz0M-0001mK-OH for 30719 <at> debbugs.gnu.org; Mon, 05 Mar 2018 17:54:51 -0500 Received: from [] (unknown []) (Authenticated sender: madler) by mail.alumni.caltech.edu (Postfix) with ESMTPSA id B2E3E10674E1; Mon, 5 Mar 2018 14:54:22 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.alumni.caltech.edu B2E3E10674E1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alumni.caltech.edu; s=enforce; t=1520290462; bh=7djQ16kgLl/xwbq0pZLUcBI/A5Nn2ZsMXT0enG7oZ3A=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=V5fhIkPRgqFMpUXW7jXxOdx8H6Im12CPV+krpX6Gvtl0wXLLpHSTU8hhIz1dgFLGF ZPp3HHbIQC2rdr8MR2J9DwdpUyFjDzRuvHZgtZYEZjNVRrbMfxykxpgmNKveoZipKN 2EYSYxQWglxy3JGdGn11V8ml45RBLyelj/MTg4c8= Content-Type: multipart/alternative; boundary="Apple-Mail=_4E2713BB-B797-4685-9CB3-962C21B3388F" Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\)) From: Mark Adler <madler@HIDDEN> In-Reply-To: <ve1y9f9vsiln.46t.xxuns.g6.gal@HIDDEN> Date: Mon, 5 Mar 2018 14:54:21 -0800 Message-Id: <54783A3B-7CB5-4CCB-BD3A-1828894750D4@HIDDEN> References: <ve1y9f9vsiln.46t.xxuns.g6.gal@HIDDEN> X-Mailer: Apple Mail (2.3445.5.20) X-MailScanner-Information-Alumni: X-Alumni-MailScanner-ID: B2E3E10674E1.AEB30 X-MailScanner-Alumni: No Virii found X-Spam-Status-Alumni: not spam, SpamAssassin (not cached, score=-1.099, required 5, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10, HTML_MESSAGE 0.00) X-MailScanner-From: madler@HIDDEN X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.3 (--) --Apple-Mail=_4E2713BB-B797-4685-9CB3-962C21B3388F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 deflate has an inherent latency that accumulates enough data in order to = efficiently emit each deflate block. You can deliberately flush (with = zlib, not gzip), but if you do that too frequently, e.g. each line, then = you will get lousy compression or even expansion. I wrote something called gzlog = (https://github.com/madler/zlib/blob/master/examples/gzlog.h = <https://github.com/madler/zlib/blob/master/examples/gzlog.h>), intended = to solve this problem. It can take a small amount of input, e.g. a line, = and update the output gzip file to be complete and valid after each = line, yet also get good compression in the long run. It does this by = writing the lines to the log.gz file effectively uncompressed (deflate = has a =E2=80=9Cstored=E2=80=9D block type), until it has accumulated, = say, 1 MB of data. Then it goes back and compresses that uncompressed 1 = MB, again always leaving the gzip file in a valid state. gzlog also = maintains something like a journal, which allows gzlog to repair the = gzip file if the last operation was interrupted, e.g. by a power = failure. > On Mar 5, 2018, at 1:18 PM, Garreau, Alexandre = <galex-713@HIDDEN> wrote: >=20 > Hi, >=20 > I have a script which has a logged very repetitive textual output > (mostly output of ping and date). To minimize disk usage, I thought to > pipe it to gzip -9. Then I realized the log, contrarily to before, > remained empty, and recalled the GNU policy of =E2=80=9Creading all = input and > only then outputting=E2=80=9D to maximize overall speed at the expense = of the > decreasingly expensive memory. >=20 > Yet I want to run that script all the time and being able to dirtily > killing it or just shutdown, without loosing all its output (nor am I > sure anyway it is a good practice of keeping everything in ram until > shutdown, considering I suppose gzip only keeps the compressed output = in > memory anyway, discarding the then useless input), and =E2=80=9Ctail = -f=E2=80=9D-ing the > files it writes. >=20 > I guess piping the whole output is the way to go to achieve optimal > compression, since otherwise just gzipping each line/command output > wouldn=E2=80=99t compress as much (since anyway the repetition occurs = among the > lines, not inside them). Yet would there be a way to obtain this = maximal > compression, while having gzip outputing each time I stop giving it > input (has I do every 30 seconds or so), without having to save the > uncompressed file, nor recompressing the whole file several times? >=20 > I mean, it seems to me a good thing to wait everything is compressed > before to output, rather than outputing as soon as possible, but = isn=E2=80=99t > there a way to trigger the output each time it has been processed and > there=E2=80=99s no more input for a certain amount of time (that is = ~30s)? >=20 > Am I looking at something like this: > #!/bin/bash > while ping -c1 gnu.org ; do > date --rfc-3339=3Dseconds > sleep 30 > done | gzip -9 -f | tee sample.log | zcat --Apple-Mail=_4E2713BB-B797-4685-9CB3-962C21B3388F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" = class=3D"">deflate has an inherent latency that accumulates enough data = in order to efficiently emit each deflate block. You can deliberately = flush (with zlib, not gzip), but if you do that too frequently, e.g. = each line, then you will get lousy compression or even expansion.<div = class=3D""><br class=3D""></div><div class=3D"">I wrote something called = gzlog (<a = href=3D"https://github.com/madler/zlib/blob/master/examples/gzlog.h" = class=3D"">https://github.com/madler/zlib/blob/master/examples/gzlog.h</a>= ), intended to solve this problem. It can take a small amount of input, = e.g. a line, and update the output gzip file to be complete and valid = after each line, yet also get good compression in the long run. It does = this by writing the lines to the log.gz file effectively uncompressed = (deflate has a =E2=80=9Cstored=E2=80=9D block type), until it has = accumulated, say, 1 MB of data. Then it goes back and compresses that = uncompressed 1 MB, again always leaving the gzip file in a valid state. = gzlog also maintains something like a journal, which allows gzlog to = repair the gzip file if the last operation was interrupted, e.g. by a = power failure.<br class=3D""><div><br class=3D""><blockquote type=3D"cite"= class=3D""><div class=3D"">On Mar 5, 2018, at 1:18 PM, Garreau, = Alexandre <<a href=3D"mailto:galex-713@HIDDEN" = class=3D"">galex-713@HIDDEN</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D"">Hi,<br class=3D""><br = class=3D"">I have a script which has a logged very repetitive textual = output<br class=3D"">(mostly output of ping and date). To minimize disk = usage, I thought to<br class=3D"">pipe it to gzip -9. Then I realized = the log, contrarily to before,<br class=3D"">remained empty, and = recalled the GNU policy of =E2=80=9Creading all input and<br = class=3D"">only then outputting=E2=80=9D to maximize overall speed at = the expense of the<br class=3D"">decreasingly expensive memory.<br = class=3D""><br class=3D"">Yet I want to run that script all the time and = being able to dirtily<br class=3D"">killing it or just shutdown, without = loosing all its output (nor am I<br class=3D"">sure anyway it is a good = practice of keeping everything in ram until<br class=3D"">shutdown, = considering I suppose gzip only keeps the compressed output in<br = class=3D"">memory anyway, discarding the then useless input), and = =E2=80=9Ctail -f=E2=80=9D-ing the<br class=3D"">files it writes.<br = class=3D""><br class=3D"">I guess piping the whole output is the way to = go to achieve optimal<br class=3D"">compression, since otherwise just = gzipping each line/command output<br class=3D"">wouldn=E2=80=99t = compress as much (since anyway the repetition occurs among the<br = class=3D"">lines, not inside them). Yet would there be a way to obtain = this maximal<br class=3D"">compression, while having gzip outputing each = time I stop giving it<br class=3D"">input (has I do every 30 seconds or = so), without having to save the<br class=3D"">uncompressed file, nor = recompressing the whole file several times?<br class=3D""><br class=3D"">I= mean, it seems to me a good thing to wait everything is compressed<br = class=3D"">before to output, rather than outputing as soon as possible, = but isn=E2=80=99t<br class=3D"">there a way to trigger the output each = time it has been processed and<br class=3D"">there=E2=80=99s no more = input for a certain amount of time (that is ~30s)?<br class=3D""><br = class=3D"">Am I looking at something like this:<br = class=3D"">#!/bin/bash<br class=3D"">while ping -c1 <a = href=3D"http://gnu.org" class=3D"">gnu.org</a> ; do<br class=3D""> = date --rfc-3339=3Dseconds<br class=3D""> = sleep 30<br class=3D"">done | gzip -9 -f | tee = sample.log | zcat<br class=3D""></div></blockquote></div><br = class=3D""></div></body></html>= --Apple-Mail=_4E2713BB-B797-4685-9CB3-962C21B3388F--
X-Loop: help-debbugs@HIDDEN Subject: bug#30719: Progressively compressing piped input Resent-From: "Garreau\, Alexandre" <galex-713@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-gzip@HIDDEN Resent-Date: Tue, 06 Mar 2018 22:08:02 +0000 Resent-Message-ID: <handler.30719.B30719.152037403723794 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 30719 X-GNU-PR-Package: gzip X-GNU-PR-Keywords: To: Mark Adler <madler@HIDDEN> Cc: 30719 <at> debbugs.gnu.org Received: via spool by 30719-submit <at> debbugs.gnu.org id=B30719.152037403723794 (code B ref 30719); Tue, 06 Mar 2018 22:08:02 +0000 Received: (at 30719) by debbugs.gnu.org; 6 Mar 2018 22:07:17 +0000 Received: from localhost ([]:48144 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1etKjr-0006Bf-Bp for submit <at> debbugs.gnu.org; Tue, 06 Mar 2018 17:07:17 -0500 Received: from [] (port=44300 helo=galex-713.eu) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <galex-713@HIDDEN>) id 1etKcA-0005ya-R9 for 30719 <at> debbugs.gnu.org; Tue, 06 Mar 2018 16:59:19 -0500 Received: from PC713 (unknown []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: galex-713) by galex-713.eu (Postfix) with ESMTPSA id 7941515F5BF; Tue, 6 Mar 2018 22:59:09 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=galex-713.eu; s=dkim; t=1520373552; bh=wSaTb7XskTHTy3FjO1uzCIT1KBK1DBP3HWJ6yw3/wOc=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=Wfi+erSwTIo/xLTQrlQ5sl9R/f8xzLZUnv9Sk6o0e45CzvDdRUPUpL700M0Q7D6/t /kZ+iq1ni7r7zvuIcL0IZDQKDw4ozjPMPxbJ+k2DYA60tAlb5QkA9TgKlRr8mNV/ib +5gx/qBxuui6+29XEyfqyr88+ScnA4tYLydez7zw= From: "Garreau\, Alexandre" <galex-713@HIDDEN> References: <ve1y9f9vsiln.46t.xxuns.g6.gal@HIDDEN> <54783A3B-7CB5-4CCB-BD3A-1828894750D4@HIDDEN> User-Agent: Gnus (5.13), GNU Emacs 25.1.1 (x86_64-pc-linux-gnu) X-GPG-FINGERPRINT: E109 9988 4197 D7CB B0BC 5C23 8DEB 24BA 867D 3F7F X-Accept-Language: fr, en, it, eo Date: Tue, 06 Mar 2018 22:58:56 +0100 In-Reply-To: <54783A3B-7CB5-4CCB-BD3A-1828894750D4@HIDDEN> (Mark Adler's message of "Mon, 5 Mar 2018 14:54:21 -0800") Message-ID: <3hbuveh2vhln.fbs.xxuns.g6.gal@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 1.3 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Le 05/03/2018 =?UTF-8?Q?=C3?= 14h54, Mark Adler a =?UTF-8?Q?=C3=A9crit=C2?= : > deflate has an inherent latency that accumulates enough data in order > to efficiently emit each deflate block. You can deliberately flush > (with zlib, not gzip), but if you do that too frequently, e.g. each > line, then you will get lousy compression or even expansion. [...] Content analysis details: (1.3 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid X-Mailman-Approved-At: Tue, 06 Mar 2018 17:07:14 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 1.3 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Le 05/03/2018 =?UTF-8?Q?=C3?= 14h54, Mark Adler a =?UTF-8?Q?=C3=A9crit=C2?= : > deflate has an inherent latency that accumulates enough data in order > to efficiently emit each deflate block. You can deliberately flush > (with zlib, not gzip), but if you do that too frequently, e.g. each > line, then you will get lousy compression or even expansion. [...] Content analysis details: (1.3 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid Le 05/03/2018 =C3=A0 14h54, Mark Adler a =C3=A9crit=C2=A0: > deflate has an inherent latency that accumulates enough data in order > to efficiently emit each deflate block. You can deliberately flush > (with zlib, not gzip), but if you do that too frequently, e.g. each > line, then you will get lousy compression or even expansion. Even if the main repetition is being between the lines? like if 80% of half the line, and 70% of the other half lines are the same? like in a while loop with only ping and date? I thought to it as a very lazy way of not having to remove all the redundant output caused by the usage of ascii, the repetition of words or similar patterns occuring ever and ever. > I wrote something called gzlog > (https://github.com/madler/zlib/blob/master/examples/gzlog.h > <https://github.com/madler/zlib/blob/master/examples/gzlog.h>), > intended to solve this problem. It can take a small amount of input, > e.g. a line, and update the output gzip file to be complete and valid > after each line, yet also get good compression in the long run. It > does this by writing the lines to the log.gz file effectively > uncompressed (deflate has a =E2=80=9Cstored=E2=80=9D block type), until i= t has > accumulated, say, 1 MB of data. Then it goes back and compresses that > uncompressed 1 MB, again always leaving the gzip file in a valid > state. gzlog also maintains something like a journal, which allows > gzlog to repair the gzip file if the last operation was interrupted, > e.g. by a power failure. I rather searched some tool that could be used as an utility (since that=E2=80=99s for a dirty high-level low-frequency medium-term task) rather than a C thing, yet that=E2=80=99s quite interesting at least in demonstrat= ing the flexibility of gzip=E2=80=A6 >> #!/bin/bash >> while ping -c1 gnu.org ; do >> date --rfc-3339=3Dseconds >> sleep 30 >> done | gzip -9 -f | tee sample.log | zcat maybe the only way to go is just gzipping everything each time a log is rotated like the standard way, if that pipe thing cannot be done even with each line being almost the same=E2=80=A6
X-Loop: help-debbugs@HIDDEN Subject: bug#30719: Progressively compressing piped input Resent-From: Mark Adler <madler@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-gzip@HIDDEN Resent-Date: Wed, 07 Mar 2018 02:13:02 +0000 Resent-Message-ID: <handler.30719.B30719.152038876321190 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 30719 X-GNU-PR-Package: gzip X-GNU-PR-Keywords: To: "Garreau, Alexandre" <galex-713@HIDDEN> Cc: 30719 <at> debbugs.gnu.org Received: via spool by 30719-submit <at> debbugs.gnu.org id=B30719.152038876321190 (code B ref 30719); Wed, 07 Mar 2018 02:13:02 +0000 Received: (at 30719) by debbugs.gnu.org; 7 Mar 2018 02:12:43 +0000 Received: from localhost ([]:48296 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1etOZM-0005Vg-UB for submit <at> debbugs.gnu.org; Tue, 06 Mar 2018 21:12:43 -0500 Received: from mail.alumni.caltech.edu ([]:43198) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <madler@HIDDEN>) id 1etOZK-0005VP-3p for 30719 <at> debbugs.gnu.org; Tue, 06 Mar 2018 21:12:38 -0500 Received: from [] (unknown []) (Authenticated sender: madler) by mail.alumni.caltech.edu (Postfix) with ESMTPSA id 2EBC2106AE0E; Tue, 6 Mar 2018 18:11:53 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.alumni.caltech.edu 2EBC2106AE0E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alumni.caltech.edu; s=enforce; t=1520388713; bh=dL1up2Ont/j7+loczcZq+AMOFpkK9wxcvIRDPv4aLeo=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=XHk9D8Vfk0CnHWBSyoe6u2MZHmam6Q8FNCEQj3kOUEd1BKfta9PSIZ3pcmFAw/+To FGwR9t+LdghPx6NEf44M6j8Uh67d5unorQqe0Wg3z7BqOXpTDwBNJXH6Sr/qNmAaZc WJRq6f01bapoketsOYqLgfVO2TFSZ3Lp4r4p1k6o= Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\)) From: Mark Adler <madler@HIDDEN> In-Reply-To: <3hbuveh2vhln.fbs.xxuns.g6.gal@HIDDEN> Date: Tue, 6 Mar 2018 18:11:51 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <00FE6CBA-74BC-43E3-A120-F44951F87AF7@HIDDEN> References: <ve1y9f9vsiln.46t.xxuns.g6.gal@HIDDEN> <54783A3B-7CB5-4CCB-BD3A-1828894750D4@HIDDEN> <3hbuveh2vhln.fbs.xxuns.g6.gal@HIDDEN> X-Mailer: Apple Mail (2.3445.5.20) X-MailScanner-Information-Alumni: X-Alumni-MailScanner-ID: 2EBC2106AE0E.AF72E X-MailScanner-Alumni: No Virii found X-Spam-Status-Alumni: not spam, SpamAssassin (not cached, score=-1.1, required 5, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU -0.10) X-MailScanner-From: madler@HIDDEN X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.3 (--) > On Mar 6, 2018, at 1:58 PM, Garreau, Alexandre = <galex-713@HIDDEN> wrote: >=20 > Le 05/03/2018 =C3=A0 14h54, Mark Adler a =C3=A9crit : >> deflate has an inherent latency that accumulates enough data in order >> to efficiently emit each deflate block. You can deliberately flush >> (with zlib, not gzip), but if you do that too frequently, e.g. each >> line, then you will get lousy compression or even expansion. >=20 > Even if the main repetition is being between the lines? like if 80% of > half the line, and 70% of the other half lines are the same? like in a > while loop with only ping and date? I thought to it as a very lazy way > of not having to remove all the redundant output caused by the usage = of > ascii, the repetition of words or similar patterns occuring ever and > ever. Alexandre, It has nothing to do with how much or how little or how often there is = repetition. It has to do with the overhead of the header of a dynamic = block that is required to describe the Huffman codes used therein. You = need several thousand symbols in order to pay for the bits required for = the header. Mark
Received: (at control) by debbugs.gnu.org; 30 Mar 2022 18:37:06 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Mar 30 14:37:06 2022 Received: from localhost ([]:36961 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1nZdBq-0005Tx-J2 for submit <at> debbugs.gnu.org; Wed, 30 Mar 2022 14:37:06 -0400 Received: from zimbra.cs.ucla.edu ([]:35918) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1nZdBp-0005TT-Kn for control <at> debbugs.gnu.org; Wed, 30 Mar 2022 14:37:06 -0400 Received: from localhost (localhost []) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 57803160090 for <control <at> debbugs.gnu.org>; Wed, 30 Mar 2022 11:36:59 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([]) by localhost (zimbra.cs.ucla.edu []) (amavisd-new, port 10032) with ESMTP id Vjosso2X5EPB for <control <at> debbugs.gnu.org>; Wed, 30 Mar 2022 11:36:58 -0700 (PDT) Received: from localhost (localhost []) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B20D816009E for <control <at> debbugs.gnu.org>; Wed, 30 Mar 2022 11:36:58 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([]) by localhost (zimbra.cs.ucla.edu []) (amavisd-new, port 10026) with ESMTP id KBd-jfecfpMi for <control <at> debbugs.gnu.org>; Wed, 30 Mar 2022 11:36:58 -0700 (PDT) Received: from [] (cpe-172-91-119-151.socal.res.rr.com []) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 9125C160090 for <control <at> debbugs.gnu.org>; Wed, 30 Mar 2022 11:36:58 -0700 (PDT) Message-ID: <a034a5ba-9c8e-7d58-e63b-96396312fd38@HIDDEN> Date: Wed, 30 Mar 2022 11:36:58 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Content-Language: en-US To: control <at> debbugs.gnu.org From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Subject: gzip bug report maintenance Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) severity 30719 wishlist
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.