Add ability to limit output size for zip unarchiver#115
Add ability to limit output size for zip unarchiver#115satamas wants to merge 3 commits intocodehaus-plexus:masterfrom
Conversation
…ction against zip bombs.
| import org.codehaus.plexus.util.IOUtil; | ||
| import org.codehaus.plexus.util.StringUtils; | ||
|
|
||
| import java.io.*; |
There was a problem hiding this comment.
we usually not use stars import (please avoid changing the order of import)
|
Hi, Thanks for contributing. The Zip bomb method you shared is quite clever and interesting. I just wonder if Apache Commons Compress is more suitable place to address this issue? It have knowledge about the Zip file internals so it is probably better equipped to deal with zip bombs (and other archive bombs for that matter). And solving it there will protect wider audience (plexus archiver including). |
|
Protection agains concrete way of creating zip bomb is task for Apache Commons Compress for sure, but the only way of protection agains all kind of such bombs is to limit produced output size.
And zip stream from Apache Commons implements InputStreamStatistics for the same purposes. I didn't used it in order to make method re-usable by other unarchivers in future. |
| private String encoding = "UTF8"; | ||
|
|
||
| @Nullable | ||
| private Long maxOutputSize; |
There was a problem hiding this comment.
Would it make sense to use long with -1 meaning unlimited in order to avoid NPEs?
There was a problem hiding this comment.
Would it makes sense to have it for other archives (such as tar)?
plamentotev
left a comment
There was a problem hiding this comment.
In general looks good. I have a couple of comments. But I think it would really help if you do the refactoring (using try-with-resources, inlining extractFileIfIncluded, etc) in separate pull request. I'll merge it and you can rebase your changes on top of it. This would make this pull request a lot easier to read.
Also there are some places where spaces are missing around braces. Not a biggie I can fix them myself before merging.
| org.apache.commons.compress.archivers.zip.ZipFile zf = null; | ||
| InputStream in = null; | ||
| try | ||
| try(ZipFile zipFile = new ZipFile( getSourceFile(), encoding, true )) |
There was a problem hiding this comment.
Would you please keep the diff minimal. I think this change makes the diff difficult to read. Using try-with-resource is better, but it would be better if this refactoring is made in separate merge request in order to make the commit easier to understand.
| final Enumeration e = zf.getEntriesInPhysicalOrder(); | ||
| getLogger().debug( "Expanding: " + getSourceFile() + " into " + getDestDirectory() ); | ||
| Long remainingSpace = maxOutputSize; | ||
| final Enumeration e = zipFile.getEntriesInPhysicalOrder(); |
There was a problem hiding this comment.
The same here. Initially I thought the order in which the entries are iterated is changed :) Refactoring unrelated to the change(like renaming variables, changing the lines order, etc) is best left in separate merge request.
|
|
||
| in.close(); | ||
| in = null; | ||
| try (InputStream in = zipFile.getInputStream(ze)) { |
| protected void extractFile( final File srcF, final File dir, final InputStream compressedInputStream, | ||
| String entryName, final Date entryDate, final boolean isDirectory, | ||
| final Integer mode, String symlinkDestination, final FileMapper[] fileMappers ) | ||
| protected File extractFile(final File srcF, final File dir, final InputStream compressedInputStream, |
There was a problem hiding this comment.
Please follow the coding style. There are spaces around the braces.
Please correct me if I'm wrong, but I think that would break existing code (if somebody have overridden the protected method). Would it make sense to keep this method void. If you don't have limit set then you don't care about the file anyway.
| } | ||
|
|
||
| protected File extractFile(final File srcF, final File dir, final InputStream compressedInputStream, | ||
| Long remainingSpace, String entryName, final Date entryDate, final boolean isDirectory, |
There was a problem hiding this comment.
I think sizeLimit or something among those lines makes more sense as this is what this variable is used for. It is the remaining size for the client of the method. For the method it is the size limit it should impose to the output stream.
There was a problem hiding this comment.
Also would it make sense to have different name for this method (as the other extractFile does not return File) to avoid confusion. I'm not saying it is. I'm just wondering.
| private String encoding = "UTF8"; | ||
|
|
||
| @Nullable | ||
| private Long maxOutputSize; |
There was a problem hiding this comment.
Would it makes sense to have it for other archives (such as tar)?
| finally | ||
| } | ||
|
|
||
| private String resolveSymlink( ZipFile zf, ZipArchiveEntry ze ) |
| if ( !isOverwrite() && f.exists() && ( f.lastModified() >= entryDate.getTime() ) ) | ||
| { | ||
| return; | ||
| return f; |
There was a problem hiding this comment.
The file is not really extracted. Then should the file be returned?
| resolveSymlink(zipFile, ze), getFileMappers()); | ||
| if (file != null) | ||
| { | ||
| remainingSpace -= file.length(); |
There was a problem hiding this comment.
Is file.length() reliable way to get the file size. What I mean is (quoting the File class API docs):
Where it is required to distinguish an I/O exception from the case that
0Lis returned, or where several attributes of the same file are required at the same time, then thejava.nio.file.Files#readAttributesmethod may be used.
|
I've asked because Apache Commons Compress already uses |
|
Hi @satamas, I've merged your changes for the try-with-resource into master. Just wanted to ping you to check if there is anything else I can do to help with moving this merge request forward. |
|
I'm on long vacation now. I've asked @serejke to take care of this PR |
|
Closing because it was implemented in #117 |
This is needed as a protection against zip bombs.
(Recently found bomb example).