generated zip archives now support utf8 file names#1117
generated zip archives now support utf8 file names#1117DeepDiver1975 wants to merge 4 commits intomasterfrom
Conversation
|
--- Original Nachricht --- Hello, sure, I do not mind dual licensing it under GPL3 and AGPL3, I give you the permission to reuse and adopt my code that hacks around utf8 limitations in standard zip extension for PHP. Petr Skoda On 7. 1. 2013, at 23:46, Thomas Müller [email protected] wrote:
|
|
PLEASE DO NOT MERGE - LICENSING HAS TO BE CLARIFIED! |
|
Hello, I hope the code is going to work fine for you. If you find any problems or discover some new way to improve it please ping me. Ciao. |
|
👍 @skodak licensed this as MIT so this is ready for merging. If we find another reviewers of course. |
|
Summing reviewers ..... |
|
The code uses fseek and others which are subject of doubts about big file handling, ie. file with size > 2GB. It should be tested with files > 2GB and > 4GB to verify it works, at least on 64 bit (PHP-) platforms. |
|
@dragotin good point - even I doubt that is makes sense to generate zips pf that size 😄 |
|
@DeepDiver1975 sure, it does not make sense to have that large files in general but people do... Users don't think in terms of file sizes, they just do stuff. The thing is that these overflow errors cause all funny side effect errors which are hard to track down, because you usually do not get a warning or error somewhere in the logfiles. So its better to keep in mind... |
|
sure - I'll test this code with big files. |
|
On 16.01.2013 10:19, Thomas Müller wrote:
|
|
Just want to add that also common ZIP software like 7-zip got the special chars broken in OC generated ZIP files, not only Windows internal zip support. (OC 4.5.6) |
|
still broken. I locally merged master and downloaded a folder as zip. not only are utf8 symbols replaced with ?? but subfolders get pulled to the root folder of the zip. with the following folder structure: first of all the 'empty folder' does not show up in the zip at all @DeepDiver1975 rebase please |
|
Still broken for me as well with the current release :
The issue is happening when I try to download several files (zip) : For Instance : When I download only a file, it's working. Nic |
|
For Gnome Archive Tools, aka, File Roller, the solution is to install p7zip as the issue of UTF-8 zip is caused by Info-Zip (patch it a little bit can also help, which is being done in some distro now). For Mac and Windows, I think recent versions of third-party software should all be fine with UTF-8 zip. |
|
What is the status here? |
|
Just observed the issue on OC6 beta3. Please rebase. |
|
Ubuntu is going to fix its unzip of 12.04-13.04 for utf8 file names issue (13.10 is already okay). It is in SRU verification phase now. Can anyone help the verification? |
|
Going to try and rebase this onto master to revive it a bit. |
|
Here you go: mega-rebase onto master. I've tested it and it still doesn't work, which, by reading the previous comments, seem to never have worked properly... at least we can use this as a base to continue experimenting. |
|
Trying to debug this: the file name workaround only seem to operate on the first file in the zip. There is still hope 😄 |
|
Test failed. |
|
That should explain those numbers: http://unix.stackexchange.com/a/14727 |
|
Ladies and gentlemen, please help testing this: download this ZIP file generated with this fix and let us know whether you were able to
|
|
Summonning the original reporters: @Hausmarke @ricardoar7 @bureautranslations @RealRancor @PKduck @frisi See #1117 (comment) if you'd like to help testing ZIP with UTF-8 characters. |
|
i'm on kde too @PVince81. ark works fine, as does unzip in console. |
|
Test failed. |
|
@PVince81 and the folder with the Chinese characters is the third and not the second folder :P |
|
@georgehrke it was the second one I created until I decided to test umlauts as well 😉 |
|
on WinXP the content is not properly displayed using 7zip and integrated zip feature in explorer |
|
From my research it looks like Windows expects the file name to be encoded based on the system's encoding, not UTF-8. Which means we'd need to detect Windows and encode using the current locale... which means that the zip files wouldn't be the same according to which OS it has been created from... which might be acceptable if the purpose of the zip file is only to download multiple files that are going to be extracted locally anyway. Needs more research... |
|
Apparently, not all zip implementations have UTF8 support so the standard specifies a "Unicode Path Extra Field" understood by most popular software. See evanmiller/mod_zip#3 The base filename should then default to a "most likely" encoding, possibly with transliteration and accent stripping from the original filenames for better compatibility. Different solutions for PHP are discussed at http://stackoverflow.com/questions/1284535/php-transliteration |
|
@futal thanks for the info. To implement this we'd need to rewrite the whole file and insert the relevant fields. The current workarounds were only about replacing bytes in the existing zip files generated by PHP. |
|
According to http://support.microsoft.com/kb/2704299 there is a slight chance that the test file generated above can be extracted properly on Windows 7. Would be good if someone could test this 😄 |
|
I have tested the following files with Windows 7 Pro 64 SP1:
For all these zip files, filenames were good with 7zip 9.22beta and wrong with Windows Explorer. |
|
@futal cool, thanks. |
|
I'll test winrar on win7 this evening |
|
Works fine with WinRAR on Windows 7 Pro |
|
maybe we should check winrar on xp too |
Do not close and reopen for each file entry, but only do the utf-8 fixing once after the final close.
|
Test failed. |
|
After discussing with @DeepDiver1975 we decided to use ZipStreamer instead as it will not only fix the utf-8 file names issues but also significantly improve download performance. See #6893 |
|
@PepeN this issue has been fixed for the upcoming OC7 |



refs #1086, #930, #578
File/folder names:
中文blah中文blah.txt
undaçao.doc
öäüß.txt
Currently known issues/observations:
Next steps:
@owncloud/core-developers
I need reviewers and tester on Windows, Mac and Linux
@skodak thanks a lot for relicensing your code!
Can I ask you to add a comment on this PR where you state again that you agree to relicense? THX