Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharepoint / O365: Session uploaded file size reported by MS Graph does not equal local file size #935

Closed
1 of 3 tasks
abraunegg opened this issue Oct 26, 2018 · 31 comments
Closed
1 of 3 tasks

Comments

@abraunegg
Copy link

Category

  • Question
  • Documentation issue
  • Bug

Expected or Desired Behavior

OneDrive should report the correct file size when uploaded

Observed Behavior

When uploading a file via a session upload, to an O365 sharepoint site, the file size of the object returned does not match the actual file which was uploaded.

Files that appear to report incorrect sizes are:

  • PDF
  • MS Office Documents
  • HTML files

Files that do not appear to report incorrect sizes are:

  • Text files

When uploading to a standard OneDrive Business Account, this behavior does not occur.

Steps to Reproduce

  • Upload a file via API using a session upload
  • Review session response for file upload completion and note the 'size' value
  • Compare the 'size' value to the actual file size

For further details, refer to abraunegg/onedrive#205 which provides additional diagnostic information

@abraunegg
Copy link
Author

@ificator
Any update on this issue?

@abraunegg
Copy link
Author

@ificator
Updated from an affected user:

Background: I just started using this client two weeks ago when I got a new machine at work. I saw this issue the very first day I tried using it on a plain .txt file. Today the problem came back in an interesting way.

I have a file called 2018-10-12 Burn Down Rate.xlsx that I initialized locally and then uploaded using this tool. It was one of 6 Excel sheets that I initialized today, but the only one to throw the file size does not match error.

My local copy of this file is 20,143 bytes. The Sharepoint Documents repository shows 25.6 KB with no obvious way of seeing the exact number of bytes. However, when I use my web browser to download the file, it comes out to 26,194 bytes! The visible content within this Excel sheet is identical, but the files themselves must be different.

A cursory look shows that Sharepoint is modifying the files after they are uploaded. I extracted each of the .XLSX files as ZIP archives and I can see that Sharepoint has injected a new file at the path docProps/custom.xml that does not exist in my original upload. There is an entire extra folder named customXml filled with 6 additional files and a subfolder as well.

So Sharepoint is modifying these files after they are uploaded. That's.... interesting. My current workaround is to delete the local file after I see the error and allow it to re-sync the updated copy from Sharepoint. I don't know if that is a sane default but it is the workflow I have adopted.

@abraunegg abraunegg changed the title session uploaded file size reported does not equal local file size Sharepoint / O365: Session uploaded file size reported by MS Graph does not equal local file size Nov 1, 2018
@abraunegg
Copy link
Author

@ificator
Any update on this issue?

@abraunegg
Copy link
Author

@ificator, @chackman
Any update on this issue? Files should not be modified after they are uploaded with data. Can this be looked at with some urgency?

@chackman
Copy link
Contributor

chackman commented Nov 5, 2018

Were these files opened or viewed between the times that they were uploaded and then downloaded? Have you checked file history to see if the file has been modified by another application?

Could you please provide information about your original upload request:

  • Date (in UTC, please)
  • request-id
  • SPRequestGuid (for requests made to OneDrive for Business)

@abraunegg
Copy link
Author

@chackman

Were these files opened or viewed between the times that they were uploaded and then downloaded? Have you checked file history to see if the file has been modified by another application?

No I do not believe so. The response from OneDrive in the response json is immediate after session upload is complete, and this is where the size difference is being noted. The file is being modified by OneDrive / MS Graph immediately after upload. As the change is size is right after the upload to indicate that the file was successfully uploaded - there is 'zero' chance something else is modifying this other that the OneDrive backend.

Could you please provide information about your original upload request:

  • Date (in UTC, please)
  • request-id
  • SPRequestGuid (for requests made to OneDrive for Business)

Will ask the users to generate some requests using additional debugging to capture this.

@abraunegg
Copy link
Author

@chackman
Tracker case against 'onedrive' client application for Linux: abraunegg/onedrive#205

@abraunegg
Copy link
Author

@chackman
Copy link
Contributor

Thanks; I'm waiting for more information from a partner team about this issue.

@abraunegg
Copy link
Author

@chackman
Any update here - Microsoft should not be modifying user's files when uploaded to OneDrive service. This is impacting a significant number of users, using various Linux clients to sync data to OneDrive.

@chackman
Copy link
Contributor

I'm still waiting for more information from a partner team about this issue.

@abraunegg
Copy link
Author

@chackman
Can this please be somehow escalated? User files should not be modified when there is zero information that this would be occuring - and by doing so this breaks user trust in the service to keep data safe, secure and in its original format as the service might be used for archival purposes.

@JeremyKelley
Copy link
Contributor

After doing a little research this is normal behavior for SharePoint and has been since at least SharePoint 2010. When uploading a file that can contain metadata SharePoint will associate some metadata from the library the file is uploaded to directly in the file. Office files are probably the most typical files where this will happen but it's not unreasonable to assume that this will happen for other rich file types as well.

Since basic text files don't have in-file metadata they don't see this behavior.

If you're syncing files to a SharePoint document library using the sync client the initial sync will have the enriched file sync'ed rather than the original.

@abraunegg
Copy link
Author

After doing a little research this is normal behavior for SharePoint and has been since at least SharePoint 2010. When uploading a file that can contain metadata SharePoint will associate some metadata from the library the file is uploaded to directly in the file. Office files are probably the most typical files where this will happen but it's not unreasonable to assume that this will happen for other rich file types as well.

Whilst it might be considered 'normal behaviour' to Microsoft to modify a file after upload - either by enrichment or other - it is still a file modification that breaks file validation mechanisms to indicate that the file was uploaded successfully. This modification would constitute unacceptable behaviour because it is happening without user knowledge or acceptance.

Just because an uploaded file 'can' be enriched, does not mean that it should be 'automatically' enriched.

Is there any way to disable this 'automatic' SharePoint document enrichment?

@chackman
Copy link
Contributor

This is a feature in SharePoint document libraries that has been around for some time. With respect to validation, have you encountered cases where a file upload appears to have succeeded, but in fact, it did not succeed?

If upload didn't work even though it appeared to work, then please provide more information about the upload that silently failed.

@abraunegg
Copy link
Author

This is a feature in SharePoint document libraries that has been around for some time. With respect to validation, have you encountered cases where a file upload appears to have succeeded, but in fact, it did not succeed?

Any file that is successfully uploaded, but modified by Microsoft & file size changed (due to modification or enrichment) when comparing to that of the actual file on my local disk - should in my mind constitute a 'failed upload' as what I 'uploaded' is not what is present on OneDrive - the Microsoft modified version is what exists. If I was then to use any sort of 'checksum' to validate if my files are 'in sync' - I would have different files which would then need to either:

  • reupload (which again would get modified)
  • replace with MS modified / enriched version .... why would I want a MS modified version of my file?

I would suggest that this 'feature' - despite its 'age' either needs to be clearly documented somewhere, or somehow switched off by default for SharePoint libraries and enabled by some sort API flag if wanted.

@derrix060
Copy link

Agree with @abraunegg.

Looks very odd for my can't trust on the host. I want to upload a file and make sure it is the same. If I also have a text file with the checksum, and when download I use the content for this checksum file to detect if I downloaded the exact file that I've upload, won't be possible.

I strongly suggest being able to disable this "feature".

@peter-sabath
Copy link

This "enriched" information should be stored in a separate/hidden data stream or blob, so "User" files are never changed.

I took a look what changes on HTML files and found added "mso:" markup in the file.
Every Developer get crazy if he finds such in handwritten file(s).

@chackman
Copy link
Contributor

Unfortunately, as stated earlier, this is how the document library features work.
We will look into documentation feedback; however, UserVoice is the appropriate place to suggest feature enhancements and improvements.

@0x3333
Copy link

0x3333 commented Apr 4, 2019

Does this behavior happen on a user's OneDrive for Bussiness?

@ytrezq
Copy link

ytrezq commented Mar 30, 2021

Does this behavior happen on a user's OneDrive for Bussiness?

@0x3333 sadly yes.

@ytrezq
Copy link

ytrezq commented Mar 30, 2021

Unfortunately, as stated earlier, this is how the document library features work.
We will look into documentation feedback; however, UserVoice is the appropriate place to suggest feature enhancements and improvements.

@chackman the problem is it doesn’t only affect document files but binary files as well. My Leveldb files get corrupted by this leading to data loss.

Also I think this is now the right place since UserVoice does no longer exists.

@lindi2
Copy link

lindi2 commented Apr 6, 2021

It seems I can at the moment workaround this problem by uploading files with the ".partial" extension and then moving them to the real filename after the transfer. I used the "SP.MoveCopyUtil.MoveFileByPath()" API of sharepoint online.

@duel007
Copy link

duel007 commented Jun 24, 2021

Has anyone actually contacted Microsoft about this? We are using Office365 and have support. I wonder what their response would be. Obviously they won't support third party tools, but if the hashes don't match after uploading normally, it's a clear problem.

@Cnly
Copy link

Cnly commented Jun 25, 2021

Has anyone actually contacted Microsoft about this? We are using Office365 and have support. I wonder what their response would be. Obviously they won't support third party tools, but if the hashes don't match after uploading normally, it's a clear problem.

Based on the conversation in this thread, I doubt they'll just happily advertise this feature to you again. I haven't thought of contacting the support before though...

@duel007
Copy link

duel007 commented Jun 25, 2021

No dice on MS support. We've had this issue in the past, and it is known and expected behavior unfortunately. The reason for the hash change is tied to the change in UUID to all for items to be searched and scanned in SharePoint.

Guess we'll just have to hope and assume that files are uploaded successfully and completely.

AndrShikov pushed a commit to AndrShikov/onedrive that referenced this issue Feb 27, 2023
* Add a check for --upload-only use when trying to work around OneDrive/onedrive-api-docs#935
* Update logging and print a warning message that the files are now technically different due to sharepoint bug and using --upload-only
AndrShikov pushed a commit to AndrShikov/onedrive that referenced this issue Feb 27, 2023
… exists on a SharePoint site (#1352)

* Fix uploading documents to Shared Business Folders when shared folder exists on a SharePoint site due to Microsoft Sharepoint 'enrichment' of files

See: OneDrive/onedrive-api-docs#935 for further details
@landall
Copy link

landall commented Jul 4, 2023

Why is this issue closed? It is a serious bug about the API or the API doc.

@GeorgesDLH
Copy link

Hi, using rclone, with the same issue.
To upload docx files, I now upload a zip file and then rename it to docx. In that case the file is not changed.
So I suppose that the enrichment is done inside the upload process. So when the file is successfully on sharepoint and you rename it then, it won't get enriched.

@mc2contributor
Copy link

Would it work to have an option to automatically use one of the renaming workarounds described to prevent SharePoint from modifying the file, such as the .partial extension used by @lindi2 or the .zip extension used by @GeorgesDLH ?

@dploeger
Copy link

Can enrichment be turned off on Sharepoint? In our case this thing just happened around a month back. In our case we export a Confluence space and sync this upload to Sharepoint. For this we first download them, rsync over the export and then sync it again. Because of the enrichment feature Sharepoint thinks, that nearly all files have changed and our engineers have to download 4GBs and thousands of files every morning.

@tomaskovacik
Copy link

tomaskovacik commented Feb 25, 2025

hmm this is how integrity is guaranteed to customers? by modifying files? wtf

this is generated by AI on support page in MS admin, but we do not have this enabled....
Support article
AI-generated content

To disable enrichment of documents uploaded to SharePoint, follow these steps:

Go to the Microsoft 365 admin center.
Select Settings > Org settings.
On the Org settings page, select Pay-as-you-go services.
On the Prebuilt document processing panel, clear the Let people create and apply models to process files checkbox and select Save.

By following these steps, you will effectively disable the enrichment of documents in SharePoint [1].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests