-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharepoint / O365: Session uploaded file size reported by MS Graph does not equal local file size #935
Comments
@ificator |
@ificator Background: I just started using this client two weeks ago when I got a new machine at work. I saw this issue the very first day I tried using it on a plain .txt file. Today the problem came back in an interesting way. I have a file called 2018-10-12 Burn Down Rate.xlsx that I initialized locally and then uploaded using this tool. It was one of 6 Excel sheets that I initialized today, but the only one to throw the file size does not match error. My local copy of this file is 20,143 bytes. The Sharepoint Documents repository shows 25.6 KB with no obvious way of seeing the exact number of bytes. However, when I use my web browser to download the file, it comes out to 26,194 bytes! The visible content within this Excel sheet is identical, but the files themselves must be different. A cursory look shows that Sharepoint is modifying the files after they are uploaded. I extracted each of the .XLSX files as ZIP archives and I can see that Sharepoint has injected a new file at the path docProps/custom.xml that does not exist in my original upload. There is an entire extra folder named customXml filled with 6 additional files and a subfolder as well. So Sharepoint is modifying these files after they are uploaded. That's.... interesting. My current workaround is to delete the local file after I see the error and allow it to re-sync the updated copy from Sharepoint. I don't know if that is a sane default but it is the workflow I have adopted. |
@ificator |
Were these files opened or viewed between the times that they were uploaded and then downloaded? Have you checked file history to see if the file has been modified by another application? Could you please provide information about your original upload request:
|
No I do not believe so. The response from OneDrive in the response json is immediate after session upload is complete, and this is where the size difference is being noted. The file is being modified by OneDrive / MS Graph immediately after upload. As the change is size is right after the upload to indicate that the file was successfully uploaded - there is 'zero' chance something else is modifying this other that the OneDrive backend.
Will ask the users to generate some requests using additional debugging to capture this. |
@chackman |
@chackman Log 1: https://pastebin.com/SNCnpM1n |
Thanks; I'm waiting for more information from a partner team about this issue. |
@chackman |
I'm still waiting for more information from a partner team about this issue. |
@chackman |
After doing a little research this is normal behavior for SharePoint and has been since at least SharePoint 2010. When uploading a file that can contain metadata SharePoint will associate some metadata from the library the file is uploaded to directly in the file. Office files are probably the most typical files where this will happen but it's not unreasonable to assume that this will happen for other rich file types as well. Since basic text files don't have in-file metadata they don't see this behavior. If you're syncing files to a SharePoint document library using the sync client the initial sync will have the enriched file sync'ed rather than the original. |
Whilst it might be considered 'normal behaviour' to Microsoft to modify a file after upload - either by enrichment or other - it is still a file modification that breaks file validation mechanisms to indicate that the file was uploaded successfully. This modification would constitute unacceptable behaviour because it is happening without user knowledge or acceptance. Just because an uploaded file 'can' be enriched, does not mean that it should be 'automatically' enriched. Is there any way to disable this 'automatic' SharePoint document enrichment? |
This is a feature in SharePoint document libraries that has been around for some time. With respect to validation, have you encountered cases where a file upload appears to have succeeded, but in fact, it did not succeed? If upload didn't work even though it appeared to work, then please provide more information about the upload that silently failed. |
Any file that is successfully uploaded, but modified by Microsoft & file size changed (due to modification or enrichment) when comparing to that of the actual file on my local disk - should in my mind constitute a 'failed upload' as what I 'uploaded' is not what is present on OneDrive - the Microsoft modified version is what exists. If I was then to use any sort of 'checksum' to validate if my files are 'in sync' - I would have different files which would then need to either:
I would suggest that this 'feature' - despite its 'age' either needs to be clearly documented somewhere, or somehow switched off by default for SharePoint libraries and enabled by some sort API flag if wanted. |
Agree with @abraunegg. Looks very odd for my can't trust on the host. I want to upload a file and make sure it is the same. If I also have a text file with the checksum, and when download I use the content for this checksum file to detect if I downloaded the exact file that I've upload, won't be possible. I strongly suggest being able to disable this "feature". |
This "enriched" information should be stored in a separate/hidden data stream or blob, so "User" files are never changed. I took a look what changes on HTML files and found added "mso:" markup in the file. |
Unfortunately, as stated earlier, this is how the document library features work. |
Does this behavior happen on a user's OneDrive for Bussiness? |
… exists on a SharePoint site (#1352) * Fix uploading documents to Shared Business Folders when shared folder exists on a SharePoint site due to Microsoft Sharepoint 'enrichment' of files See: OneDrive/onedrive-api-docs#935 for further details
@0x3333 sadly yes. |
@chackman the problem is it doesn’t only affect document files but binary files as well. My Leveldb files get corrupted by this leading to data loss. Also I think this is now the right place since UserVoice does no longer exists. |
It seems I can at the moment workaround this problem by uploading files with the ".partial" extension and then moving them to the real filename after the transfer. I used the "SP.MoveCopyUtil.MoveFileByPath()" API of sharepoint online. |
Has anyone actually contacted Microsoft about this? We are using Office365 and have support. I wonder what their response would be. Obviously they won't support third party tools, but if the hashes don't match after uploading normally, it's a clear problem. |
Based on the conversation in this thread, I doubt they'll just happily advertise this feature to you again. I haven't thought of contacting the support before though... |
No dice on MS support. We've had this issue in the past, and it is known and expected behavior unfortunately. The reason for the hash change is tied to the change in UUID to all for items to be searched and scanned in SharePoint. Guess we'll just have to hope and assume that files are uploaded successfully and completely. |
* Add a check for --upload-only use when trying to work around OneDrive/onedrive-api-docs#935 * Update logging and print a warning message that the files are now technically different due to sharepoint bug and using --upload-only
… exists on a SharePoint site (#1352) * Fix uploading documents to Shared Business Folders when shared folder exists on a SharePoint site due to Microsoft Sharepoint 'enrichment' of files See: OneDrive/onedrive-api-docs#935 for further details
Why is this issue closed? It is a serious bug about the API or the API doc. |
Hi, using rclone, with the same issue. |
Would it work to have an option to automatically use one of the renaming workarounds described to prevent SharePoint from modifying the file, such as the .partial extension used by @lindi2 or the .zip extension used by @GeorgesDLH ? |
Can enrichment be turned off on Sharepoint? In our case this thing just happened around a month back. In our case we export a Confluence space and sync this upload to Sharepoint. For this we first download them, rsync over the export and then sync it again. Because of the enrichment feature Sharepoint thinks, that nearly all files have changed and our engineers have to download 4GBs and thousands of files every morning. |
hmm this is how integrity is guaranteed to customers? by modifying files? wtf this is generated by AI on support page in MS admin, but we do not have this enabled.... To disable enrichment of documents uploaded to SharePoint, follow these steps:
By following these steps, you will effectively disable the enrichment of documents in SharePoint [1]. |
Category
Expected or Desired Behavior
OneDrive should report the correct file size when uploaded
Observed Behavior
When uploading a file via a session upload, to an O365 sharepoint site, the file size of the object returned does not match the actual file which was uploaded.
Files that appear to report incorrect sizes are:
Files that do not appear to report incorrect sizes are:
When uploading to a standard OneDrive Business Account, this behavior does not occur.
Steps to Reproduce
For further details, refer to abraunegg/onedrive#205 which provides additional diagnostic information
The text was updated successfully, but these errors were encountered: