Skip to content

Comments

refactored/cleaned up lib/files.php and switched get() to ZipStreamer#3439

Closed
McNetic wants to merge 5 commits intoowncloud:masterfrom
McNetic:zipstream
Closed

refactored/cleaned up lib/files.php and switched get() to ZipStreamer#3439
McNetic wants to merge 5 commits intoowncloud:masterfrom
McNetic:zipstream

Conversation

@McNetic
Copy link
Contributor

@McNetic McNetic commented May 21, 2013

This commit mainly adds the ability for owncloud to use ZipStreamer instead of the builtin ZipArchive. This reduces disk i/o greatly, as the current implementation creates one temp file for each file to be added to the zip, and then again a temp file of the complete zip, before starting to send the temp zip to the client.
The ZipStreamer creates the zip file on the fly in memory and directly sends the data to the client, no temp file needed at all. The ZipStreamer class was added in a second pull request to 3rdparty.

cleaned up get() logic
fixed get() to only send headers if requested (xsendfile could get in the way)
switched zip file creation to ZipStreamer to create zip files directly in memory instead of creating (multiple) temporary files which caused heavy i/o
@ghost
Copy link

ghost commented May 21, 2013

Can one of the admins verify this patch?

@McNetic
Copy link
Contributor Author

McNetic commented May 21, 2013

Oh, I forgot to mention that the current implementation of ZipStreamer does not compress the zip file. It is certainly possible, but I found no easy way to do chunk-wise compression in php.

@McNetic McNetic closed this May 21, 2013
@McNetic McNetic reopened this May 21, 2013
@DeepDiver1975
Copy link
Member

@owncloud-bot this is okay to test

@DeepDiver1975
Copy link
Member

whitespaces are totally off - please fix this - THX

Conceptual question:
How does this work together with xsendfile/X-accel?
As far as I can tell these modules require a physical file to work properly.

@karlitschek
Copy link
Contributor

Isn't the fact that archives are no longer compressed a huge drawback? In which scenarios is Disk I/O an issue?

@DeepDiver1975
Copy link
Member

@owncloud-bot retest this please

@DeepDiver1975
Copy link
Member

Oh, I forgot to mention that the current implementation of ZipStreamer does not compress the zip file

Are you telling us that the content is not zipped at all anymore?

@karlitschek
Copy link
Contributor

I would also like to understand the memory impact especially with huge files and a lot of users.

@McNetic
Copy link
Contributor Author

McNetic commented May 21, 2013

This definitely needs discussion on your side. It certainly has some drawbacks. Maybe this should also be made optional.

I'll explain why I did this:
I used owncloud to share around 1.5 GB of files (one big movie, and a lot of photo jpegs) with a group of people. Most of them want to download the whole package. Even a single download of this made i/o my server go quite mad (the data and temp are on the same sw raid 1). It would take minutes before the download of the zip file would even start. The user does not see any progress in this time. I found that it is quite inefficient to copy every file to a temp file, and zip these in a temp file, so I searched for a php library that would directly zip and stream to the client. I found two solutions, that both did not work correctly or did not have the ability to work on a file handle as input, so I put the ZipStreamer class together myself, and tested it with owncloud and my application scenario. With this solution, the download starts immediately and is fast with very low i/o and cpu usage.

As I'm not familiar with the internals of owncloud and also not a php specialist, I can not tell if there are no major drawbacks or problems I missed, so I wanted to put the idea and the seemingly working solution to your attention. I felt that the main application of the zip functionality is to be able to download a lot of files/file hierarchies with one download. I think compression is secondary in this respect. I also felt that if the archive to be build grows large (as in several hundred mb or more), working with two temporary files per download would be an i/o problem.

As for your concerns, and the pros and cons:

  • X-Sendfile and X-Accel are disabled when streaming zip files in this approach. I do not know about this mechanisms, but they indeed seem to require physical files. I think that creating a large temporary file (that can take minutes, in fact) will slow down at lot more than can be regained, but correct me if I'm mislead.
  • The data is in fact zipped, but no longer compressed. The ZipStreamer class works on chunks of 1MB. Chunk-wise compression (with deflate, the standard zip compression) is standardized and certainly possible, but I found no way to do it by builtin php means (although there are several ways in php to deflate). It would be easy to extend ZipStreamer to compress all files that are smaller than the chunk size, and maybe increase the chunk size so "most" files would be compressed - on the other hand, most big files maybe are movies or jpegs anyway, where compression does not help much, if at all.
    It would also be possible to extend ZipStreamer to do chunk-wise compression if someone found a way to do it with the php functions. I think reimplementing it in php would not perform very well, but it might also be possible.
  • The memory impact should be minimal. As I said, the ZipStreamer works on 1 MB chunks, and each chunk is send to the client immediately. I think ZipArchive will internally also work with chunks, but I don't know which size it uses. But the chunk size can simply be changed, also.

So, as final word: Probably, the implementation is not perfect. I'm happy to help make it better; I'm also not mad if it is completely rejected, but I think it is a direction that's worth a look at.

PS: I'm not sure what is with the whitespaces. Maybe dos line endings? My tortoisegit showed the diff correctly.

@MTGap
Copy link
Contributor

MTGap commented May 21, 2013

Isn't the fact that archives are no longer compressed a huge drawback?

I don't think it's a problem that the files aren't compressed. I'm going to extract them after the download is complete anyways. We aren't storing the files in a zip on the server so there are no savings from compressing it if it takes longer to start the download.

@DeepDiver1975
Copy link
Member

@McNetic thanks a lot for your detailed explanation! Much appreciated! I'll try to find some time to have a closer look at your implementation. THX

lib/files.php Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonus points for using === here.

@Niduroki
Copy link
Member

Added some comments in courtesy of #2979.

@karlitschek
Copy link
Contributor

how does it perform for very gib files >8GB ?

@ghost
Copy link

ghost commented Jul 16, 2013

Can one of the admins verify this patch?

@DeepDiver1975
Copy link
Member

@owncloud-bot this is ok to test

@DeepDiver1975
Copy link
Member

@McNetic sorry for the long silence - can I ask you to rebase? THX

@ghost
Copy link

ghost commented Aug 8, 2013

cleaned up get() logic
fixed get() to only send headers if requested (xsendfile could get in the way)
switched zip file creation to ZipStreamer to create zip files directly in memory instead of creating (multiple) temporary files which caused heavy i/o
@witchi
Copy link

witchi commented Nov 29, 2013

I have implemented zip64 extension to ZipStreamer, which allows bigger files. You will need a newer ZIP implementation like 7z on Windows XP to extract the files. See witchi/PHPZipStreamer. It works with Owncloud 5.0.11 and PHP 5.3.17/Suhosin 0.9.33 on my side.

@PVince81
Copy link
Contributor

Does ZipStreamer generate zip files with correct UTF-8 encoding ?
This change somewhat conflicts with #1117

@witchi
Copy link

witchi commented Jan 15, 2014

I have tested it here with my patched OC5, and it works. I have used 中文blah中文blah as folder name and have uploaded 中文blah中文blah.txt as file. Then I have downloaded the complete folder and within my SuSE Linux I see the chinese (or whatever) characters within the zip-filename.

@PVince81
Copy link
Contributor

The problem is mostly with Windows machines.
Can you try opening the generated file on a Windows system ?

See #1117 (comment) for
a list for which that other ticket seem to fix it.

Basically it should set the UTF-8 flag and also set the OS flag to unix
and set permissions.

@witchi
Copy link

witchi commented Jan 15, 2014

On Windows XP I see only "boxes" instead of the chinese characters, but I can open the zip with 7zip and extract the files. I have installed another fork of this repository: https://github.com/witchi/PHPZipStreamer which contains UTF-8 flag and Unix-flag. It also sets some permissions.

@PVince81
Copy link
Contributor

@witchi wow, great!
So this implementation would also fix the encoding issues we were having. 😃

@McNetic can you rebase this PR ?

@witchi
Copy link

witchi commented Jan 15, 2014

I have tested it on Windows 7 just now. The encoding is right within 7zip 9.2, but the Windows zip-support does only display the filename of the zip-file right, but the containing files and folders have wrong filenames. I can open the files, but the names are like õ©¡µûçblahõ©¡µûçblah instead of 中文blah中文blah. So it seems that Windows has problems with the UTF-8 format of zip, whereas 7zip looks good.

@PVince81
Copy link
Contributor

@witchi yes, it seems that Win7 compressed folders still have issues... looks like there is nothing we can do about it.

@DeepDiver1975
Copy link
Member

@witchi @McNetic
Can I ask you to agree on which github repo will be the leading one?
I'd like to setup our composer file in 3rdparty to be capable to easily update in case of changes.

Finally I have to add that unit tests as well as travis-ci integration within that repo would be great 😉

Thanks a lot for your support - much appreciated!

@witchi
Copy link

witchi commented Jan 20, 2014

No problem on my side.

The current code generates always 64bits zip files, the RFC contains an option to switch between 32/64 bits, but I haven't implemented it. I have only large files >4 GB here, so there was no reason to enhance the code.

@McNetic
Copy link
Contributor Author

McNetic commented Jan 22, 2014

I added a new pull request base on current master: #6893.

@McNetic McNetic closed this Jan 22, 2014
@Povezi
Copy link

Povezi commented Mar 14, 2014

I have installed latest owncloud today, and get following error with zipstreamer...

PHP Parse error: syntax error, unexpected '[', expecting ')' in /var/www/owncloud/3rdparty/mcnetic/phpzipstreamer/ZipStreamer.php on line 260,

any idea? maybe some wrong privilege?

@DeepDiver1975
Copy link
Member

@Povezi please open a new issue here - looks like ZipStreamer required php 5.4 and you are obviously running php5.3

@Povezi
Copy link

Povezi commented Mar 14, 2014

yap, I am using PHP 5.3.3-7+squeeze19 with Suhosin-Patch (cli) (built: Feb 18 2014 13:59:15)

@PVince81
Copy link
Contributor

I raised an issue upstream: McNetic/PHPZipStreamer#7

@Povezi Would still be good to have an issue here as a reminder to update from upstream.

@McNetic
Copy link
Contributor Author

McNetic commented Mar 14, 2014

I fixed the issue in ZipStreamer.

@DeepDiver1975
Copy link
Member

@McNetic thanks a lot! I'll update our referenced code these days - THX

@Povezi
Copy link

Povezi commented Mar 15, 2014

@DeepDiver1975 Please can you notify me here when to update owncloud with latest?

DeepDiver1975 added a commit to owncloud-archive/3rdparty that referenced this pull request Mar 17, 2014
@Povezi
Copy link

Povezi commented Mar 19, 2014

Thank you very much! Now new owncloud is working on my php5.3!

@Povezi
Copy link

Povezi commented Mar 26, 2014

I have found out that there is a new problem related to PhpZipStream... if you select multiple files and want them do zip&download, PhpZipStream makes an archive and browser downloads it successfully. But whan I try to uncompress in totalcommander I get an error "please insert disk number 0" and all files are 0 file size... in windows explorer zip can be uncompressed... so I suppose that some flag for multiarchive is set wrongly...

@PVince81
Copy link
Contributor

@Povezi please raise zipstreamer issues in the zipstreamer project: https://github.com/McNetic/PHPZipStreamer/
Thanks.

@Povezi
Copy link

Povezi commented Mar 26, 2014

yap, sorry ;)

@AdamWill
Copy link
Contributor

This looks like it was fixed in upstream almost as soon as it was reported:

McNetic/PHPZipStreamer@0af57cc

but the upstream fix is different from the one OC is carrying? It'd be best to stay synced with upstream, I think.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants