Skip to content

Commit 75a6693

Browse files
EdSchoutencopybara-github
authored andcommitted
Use a larger buffer size for java.util.zip.*Stream classes
`DeflaterInputStream`, `GZIPInputStream`, `GZIPOutputStream`, and `InflaterInputStream`, all use an internal byte buffer of 512 bytes by default. Whenever the wrapped stream exceeds this size, a full copy to a new buffer will occur, which will increase at increments of the same size. For example, a stream of length 2K will be copied four times. Increasing the size of the buffer we use can result in significant reductions in CPU usage (read: copies). Examples in the repository -------------------------- There are already two places where we increase the default size of these buffers: - `//src/main/java/com/google/devtools/build/lib/bazel/repository/TarGzFunction.java` - `//src/main/java/com/google/devtools/build/lib/bazel/repository/downloader/HttpStream.java` Prior art --------- There is an open enhancement issue in the OpenJDK tracker on this which contains a benchmark for `InflaterOutputStream`: > Increase the default, internal buffer size of the Streams in `java.util.zip` > https://bugs.openjdk.org/browse/JDK-8242864 A similar change was merged in for JDK15+ in 2020: > Improve performance of `InflaterOutputStream.write()` > https://bugs.openjdk.org/browse/JDK-8242848 Providing a simple benchmark ---------------------------- I'm inlining a simple `jmh` benchmark and the results underneath it for one `GzipInputStream` case. The benchmark: ``` @fork(1) @threads(1) @WarmUp(iterations = 2) @State(Scope.Benchmark) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class GZIPInputStreamBenchmark { @param({"1024", "3072", "9216"}) long inputLength; @param({"512", "1024", "4096", "8192"}) int bufferSize; private byte[] content; @setup(Level.Iteration) public void setup() throws IOException { var baos = new ByteArrayOutputStream(); // No need to set the buffer size on this as it's a one-time cost for setup and not counted in the result. var gzip = new GZIPOutputStream(baos); var inputBytes = generateRandomByteArrayOfLength(inputLength); gzip.write(inputBytes); gzip.finish(); this.content = baos.toByteArray(); } @benchmark @BenchmarkMode(Mode.AverageTime) public void getGzipInputStream(Blackhole bh) throws IOException { try (var is = new ByteArrayInputStream(this.content); var gzip = new GZIPInputStream(is, bufferSize)) { bh.consume(gzip.readAllBytes()); } } byte[] generateRandomByteArrayOfLength(long length) { var random = new Random(); var intStream = random.ints(0, 5000).limit(length).boxed(); return intStream.collect( ByteArrayOutputStream::new, (baos, i) -> baos.write(i.intValue()), (baos1, baos2) -> baos1.write(baos2.toByteArray(), 0, baos2.size()) ).toByteArray(); } } ``` The results: ``` Benchmark (bufferSize) (inputLength) Mode Cnt Score Error Units GZIPInputStreamBenchmark.getGzipInputStream 512 1024 avgt 5 3207.217 ± 24.919 ns/op GZIPInputStreamBenchmark.getGzipInputStream 512 3072 avgt 5 5874.191 ± 5.827 ns/op GZIPInputStreamBenchmark.getGzipInputStream 512 9216 avgt 5 15567.345 ± 93.281 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 1024 avgt 5 2580.566 ± 14.566 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 3072 avgt 5 4154.582 ± 16.016 ns/op GZIPInputStreamBenchmark.getGzipInputStream 1024 9216 avgt 5 9942.521 ± 61.215 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 1024 avgt 5 2150.255 ± 52.770 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 3072 avgt 5 2289.185 ± 71.396 ns/op GZIPInputStreamBenchmark.getGzipInputStream 4096 9216 avgt 5 5656.891 ± 28.499 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 1024 avgt 5 2177.427 ± 30.896 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 3072 avgt 5 2517.390 ± 21.296 ns/op GZIPInputStreamBenchmark.getGzipInputStream 8192 9216 avgt 5 5227.932 ± 55.525 ns/op ``` Co-authored-by: Kushal Pisavadia <[email protected]> Closes #20316. PiperOrigin-RevId: 588444920 Change-Id: I1fb47f0b08dcb8d72f3e2c43534c33d60efb87f2
1 parent 020b85e commit 75a6693

File tree

4 files changed

+23
-12
lines changed

4 files changed

+23
-12
lines changed

src/java_tools/singlejar/java/com/google/devtools/build/singlejar/ZipCombiner.java

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@
6666
* <a href="http://www.pkware.com/documents/casestudies/APPNOTE.TXT">ZIP format</a>
6767
*/
6868
public class ZipCombiner implements AutoCloseable {
69+
private static final int INFLATER_BUFFER_BYTES = 8192;
6970
public static final Date DOS_EPOCH = new Date(ZipUtil.DOS_EPOCH);
7071
/**
7172
* Whether to compress or decompress entries.
@@ -440,7 +441,7 @@ public void addZip(File zipFile) throws IOException {
440441
entries.put(filename, null);
441442
InputStream in = zip.getRawInputStream(entry);
442443
if (entry.getMethod() == Compression.DEFLATED) {
443-
in = new InflaterInputStream(in, getInflater());
444+
in = new InflaterInputStream(in, getInflater(), INFLATER_BUFFER_BYTES);
444445
}
445446
action.getStrategy().merge(in, action.getMergeBuffer());
446447
break;
@@ -492,7 +493,9 @@ private void writeEntryFromBuffer(ZipFileEntry entry, byte[] uncompressed) throw
492493
writeEntry(entry, new ByteArrayInputStream(uncompressed));
493494
} else {
494495
ByteArrayOutputStream compressed = new ByteArrayOutputStream();
495-
copyStream(new DeflaterInputStream(new ByteArrayInputStream(uncompressed), getDeflater()),
496+
copyStream(
497+
new DeflaterInputStream(
498+
new ByteArrayInputStream(uncompressed), getDeflater(), INFLATER_BUFFER_BYTES),
496499
compressed);
497500
entry.setMethod(Compression.DEFLATED);
498501
entry.setCompressedSize(compressed.size());
@@ -529,14 +532,19 @@ private void writeEntry(ZipReader zip, ZipFileEntry entry, EntryAction action)
529532
// from the raw file data and deflate to a temporary byte array to determine the deflated
530533
// size. Then use this byte array as the input stream for writing the entry.
531534
ByteArrayOutputStream tmp = new ByteArrayOutputStream();
532-
copyStream(new DeflaterInputStream(zip.getRawInputStream(entry), getDeflater()), tmp);
535+
copyStream(
536+
new DeflaterInputStream(
537+
zip.getRawInputStream(entry), getDeflater(), INFLATER_BUFFER_BYTES),
538+
tmp);
533539
data = new ByteArrayInputStream(tmp.toByteArray());
534540
outEntry.setMethod(Compression.DEFLATED);
535541
outEntry.setCompressedSize(tmp.size());
536542
} else if (mode == OutputMode.FORCE_STORED && entry.getMethod() != Compression.STORED) {
537543
// The output mode is stored, but the entry compression is not; create an inflater stream
538-
// from the raw file data.
539-
data = new InflaterInputStream(zip.getRawInputStream(entry), getInflater());
544+
// from the raw file data.
545+
data =
546+
new InflaterInputStream(
547+
zip.getRawInputStream(entry), getInflater(), INFLATER_BUFFER_BYTES);
540548
outEntry.setMethod(Compression.STORED);
541549
outEntry.setCompressedSize(entry.getSize());
542550
} else {

src/java_tools/singlejar/java/com/google/devtools/build/zip/ZipEntryInputStream.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@
1515
package com.google.devtools.build.zip;
1616

1717
import com.google.devtools.build.zip.ZipFileEntry.Compression;
18-
1918
import java.io.IOException;
2019
import java.io.InputStream;
2120
import java.util.zip.Inflater;
@@ -24,6 +23,7 @@
2423

2524
/** An input stream for reading the file data of a ZIP file entry. */
2625
class ZipEntryInputStream extends InputStream {
26+
private static final int INFLATER_BUFFER_BYTES = 8192;
2727
private InputStream stream;
2828
private long rem;
2929

@@ -61,7 +61,7 @@ class ZipEntryInputStream extends InputStream {
6161
rem = zipEntry.getSize();
6262
}
6363
if (!raw && zipEntry.getMethod() == Compression.DEFLATED) {
64-
stream = new InflaterInputStream(stream, new Inflater(true));
64+
stream = new InflaterInputStream(stream, new Inflater(true), INFLATER_BUFFER_BYTES);
6565
}
6666
}
6767

src/main/java/com/google/devtools/build/lib/analysis/actions/FileWriteAction.java

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,7 @@ protected void computeKey(
231231

232232
private static final class CompressedFileWriteAction extends FileWriteAction {
233233
private static final String GUID = "5bfba914-2251-11ee-be56-0242ac120002";
234+
private static final int GZIP_BYTES_BUFFER = 8192;
234235

235236
private final byte[] compressedBytes;
236237
private final int uncompressedSize;
@@ -252,7 +253,7 @@ private static final class CompressedFileWriteAction extends FileWriteAction {
252253
// Presize on the small end to avoid over-allocating memory.
253254
ByteArrayOutputStream byteStream = new ByteArrayOutputStream(dataToCompress.length / 100);
254255

255-
try (GZIPOutputStream zipStream = new GZIPOutputStream(byteStream)) {
256+
try (GZIPOutputStream zipStream = new GZIPOutputStream(byteStream, GZIP_BYTES_BUFFER)) {
256257
zipStream.write(dataToCompress);
257258
} catch (IOException e) {
258259
// This should be impossible since we're writing to a byte array.
@@ -268,7 +269,7 @@ private static final class CompressedFileWriteAction extends FileWriteAction {
268269
public String getFileContents() {
269270
byte[] uncompressedBytes = new byte[uncompressedSize];
270271
try (GZIPInputStream zipStream =
271-
new GZIPInputStream(new ByteArrayInputStream(compressedBytes))) {
272+
new GZIPInputStream(new ByteArrayInputStream(compressedBytes), GZIP_BYTES_BUFFER)) {
272273
int read;
273274
int totalRead = 0;
274275
while (totalRead < uncompressedSize
@@ -293,7 +294,7 @@ public String getFileContents() {
293294
public DeterministicWriter newDeterministicWriter(ActionExecutionContext ctx) {
294295
return out -> {
295296
try (GZIPInputStream gzipIn =
296-
new GZIPInputStream(new ByteArrayInputStream(compressedBytes))) {
297+
new GZIPInputStream(new ByteArrayInputStream(compressedBytes), GZIP_BYTES_BUFFER)) {
297298
ByteStreams.copy(gzipIn, out);
298299
}
299300
};

src/main/java/com/google/devtools/build/lib/rules/genquery/GenQueryOutputStream.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,8 @@ class GenQueryOutputStream extends OutputStream {
4343
*/
4444
private static final int COMPRESSION_THRESHOLD = 1 << 20;
4545

46+
private static final int GZIP_BYTES_BUFFER = 8192;
47+
4648
/**
4749
* Encapsulates the output of a {@link GenQuery}'s query. CPU and memory overhead of individual
4850
* methods depends on the underlying content and settings.
@@ -83,7 +85,7 @@ interface GenQueryResult {
8385
GenQueryOutputStream(boolean compressedOutputRequested) throws IOException {
8486
this.compressedOutputRequested = compressedOutputRequested;
8587
if (compressedOutputRequested) {
86-
this.out = new GZIPOutputStream(bytesOut);
88+
this.out = new GZIPOutputStream(bytesOut, GZIP_BYTES_BUFFER);
8789
this.outputWasCompressed = true;
8890
} else {
8991
this.out = bytesOut;
@@ -138,7 +140,7 @@ private void maybeStartCompression(int additionalBytes) throws IOException {
138140
}
139141

140142
ByteString.Output compressedBytesOut = ByteString.newOutput();
141-
GZIPOutputStream gzipOut = new GZIPOutputStream(compressedBytesOut);
143+
GZIPOutputStream gzipOut = new GZIPOutputStream(compressedBytesOut, GZIP_BYTES_BUFFER);
142144
bytesOut.writeTo(gzipOut);
143145
bytesOut = compressedBytesOut;
144146
out = gzipOut;

0 commit comments

Comments
 (0)