Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drive: intermittent crashes with: stream error: stream ID ; INTERNAL_ERROR #3631

Open
ncw opened this issue Oct 16, 2019 · 25 comments
Open

drive: intermittent crashes with: stream error: stream ID ; INTERNAL_ERROR #3631

ncw opened this issue Oct 16, 2019 · 25 comments

Comments

@ncw
Copy link
Member

ncw commented Oct 16, 2019

In the forum issues were reported with v1.49.4.

The issues looked like this

2019/09/30 13:36:34 INFO  : Google drive root 'crypt': Failed to get StartPageToken: Get https://www.googleapis.com/drive/v3/changes/startPageToken?alt=json&prettyPrint=false&supportsAllDrives=true: stream error: stream ID 1923; INTERNAL_ERROR
2019/09/30 13:42:04 INFO  : Google drive root 'crypt': Failed to get StartPageToken: Get https://www.googleapis.com/drive/v3/changes/startPageToken?alt=json&prettyPrint=false&supportsAllDrives=true: stream error: stream ID 1929; INTERNAL_ERROR
2019/10/02 19:36:29 ERROR : IO error: open file failed: Get https://www.googleapis.com/drive/v3/files/XXX?alt=media: stream error: stream ID 1217; INTERNAL_ERROR

This is an internal error from the go standard library HTTP/2 code.

v1.49.4 was accidentally compiled with go1.13 which enabled http/2 by default for google drive for reasons not clear from the changelog.

Rolling the compiler back to go1.12 with no other code changes fixes the problem which is what was released as v1.49.5

An attempt at a workaround making these errors retriable was put in f9f9d50 but this did not seem effective.

Problem statement

  • go1.12 runs reliably with google drive over HTTP/1
  • go1.13 runs unreliably with google drive over HTTP/2
@ncw ncw added this to the v1.50 milestone Oct 16, 2019
ncw added a commit that referenced this issue Oct 16, 2019
Before this change when rclone was compiled with go1.13 it used HTTP/2
to contact drive by default.

This causes lockups and INTERNAL_ERRORs from the HTTP/2 code.

This is a workaround disabling the HTTP/2 code on an option.

It can be re-enabled with `--drive-disable-http2=false`

See #3631
@ivandeex
Copy link
Member

It can be a good idea to add drive test involving simple upload/download scenario with --drive-disable-htp2=false (or internal test with similar setting), ignore it in config.yaml and collect long-term stats.

@ncw
Copy link
Member Author

ncw commented Oct 17, 2019

It can be a good idea to add drive test involving simple upload/download scenario with --drive-disable-htp2=false (or internal test with similar setting), ignore it in config.yaml and collect long-term stats.

I think setting up a reproduction test is a good idea... I tried with a simple mount which did directory listings, but I think it needs more than that. I haven't had time to set up something more complex yet.

@ivandeex
Copy link
Member

Just FYI
A number of CVEs was filed against HTTP2 recently. Large providers like Google are no doubt refreshing their http/2 stack. https://blog.cloudflare.com/on-the-recent-http-2-dos-attacks/

@thestigma
Copy link
Contributor

Interesting. So this whole thing could simply be a temporary issue outside of rclone then.

I will be re-enabling HTTP/2 for my normal use and seeing how it goes - if it has stabilized ect.
If I see more stalling I will make sure to make a note here about it.
(will be testing from version 1.50)

@ncw
Copy link
Member Author

ncw commented Oct 26, 2019

Interesting. So this whole thing could simply be a temporary issue outside of rclone then.

It could be, yes.

There is probably a bug in the HTTP/2 code in the go standard library which is being triggered by a bug in the google drive HTTP/2 code - that is my guess!

I will be re-enabling HTTP/2 for my normal use and seeing how it goes - if it has stabilized ect.
If I see more stalling I will make sure to make a note here about it.
(will be testing from version 1.50)

Great - thanks.

@ncw ncw modified the milestones: v1.50, v1.51 Oct 26, 2019
@thestigma
Copy link
Contributor

While using --drive-disable-htp2=false

I got this today
ReadFileHandle.Read error: low level retry 1/10: stream error: stream ID 1345; INTERNAL_ERROR

it did not result in a full stall however. it managed to recover it seems.

I will set
-drive-disable-htp2=true
and continue monitoring

@ncw
Copy link
Member Author

ncw commented Nov 2, 2019

@thestigma I just uploaded v1.50.1 which was compiled with go1.13.4. This has an HTTP/2 fix which could plausibly be a fix for this issue.

@thestigma
Copy link
Contributor

Cool!
I will instead continue testing on
-drive-disable-htp2=false
with version 1.50.1

I assume that flag is still "true" by default for the time being - until we can more clearly verify it works again?

@ncw
Copy link
Member Author

ncw commented Nov 5, 2019

Cool!
I will instead continue testing on
-drive-disable-htp2=false
with version 1.50.1

Thank you.

I assume that flag is still "true" by default for the time being - until we can more clearly verify it works again?

I won't change the default until we are sure it is working properly.

@thestigma
Copy link
Contributor

Just want to leave a note here - that It's been a very long time now since I've seen this same error.
I am still running with
-drive-disable-htp2=false
It has been about as long as since my last post - although I haven't had a ton of load on my system in this period to be fair.

@ncw
Copy link
Member Author

ncw commented Dec 4, 2019

Just want to leave a note here - that It's been a very long time now since I've seen this same error.
I am still running with
-drive-disable-htp2=false
It has been about as long as since my last post - although I haven't had a ton of load on my system in this period to be fair.

Thank you for the report :-) If we want to change the default then we should probably get a bit more testing by other users.

@ncw ncw modified the milestones: v1.51, v1.52 Feb 1, 2020
@Animosity022
Copy link
Collaborator

I'm going to turn back on http2 and see how things work.

@ncw
Copy link
Member Author

ncw commented May 4, 2020

I'm going to turn back on http2 and see how things work.

Great idea - thanks.

@Animosity022
Copy link
Collaborator

Hmm, I got a read error already with almost no load going on but it did seem to retry I think.

2020/05/04 13:52:31 ERROR : XXXX: ReadFileHandle.Read error: low level retry 1/10: stream error: stream ID 57; INTERNAL_ERROR

@ncw
Copy link
Member Author

ncw commented May 5, 2020

Hmm, I got a read error already with almost no load going on but it did seem to retry I think.

:-( Which rclone were you trying? It would be worth trying the latest beta if you didn't already.

@ncw ncw modified the milestones: v1.52, v1.53 May 29, 2020
breadcat added a commit to breadcat/Dockerfiles that referenced this issue Jul 24, 2020
Experiencing CURL issues with Kodi streaming as Traefik advertises it's
HTTP/2 compatible, even if the proxied client isn't. This still has
issues with instability but resolves more. See
rclone/rclone#3631
@darthShadow
Copy link
Member

Just an update on this, I have not noticed the same kind of errors for quite a few months now and I have been using the latest builds for that duration. However, I have noticed speeds being slower when multiple files are being uploaded via the mount with HTTP/2 enabled.

Some numbers for the same:

  • 1 file uploading via the mount (with HTTP/2 Enabled): ~ 80 MB/s
  • 4 files uploading via the mount (with HTTP/2 Enabled): ~ 200 MB/s (Approx. 50 MB/s per file)
  • 4 files uploading via the mount (with HTTP/2 Disabled): ~ 280 MB/s (Approx. 70 MB/s per file)
  • 4 individual uploads to the remote (bypassing the mount, via a direct copy) (with HTTP/2 Disabled): ~ 320 MB/s (Approx. 80 MB/s per file)

@ncw
Copy link
Member Author

ncw commented Aug 25, 2020

Good news that you didn't see the issue any more. Which VFS mode are you using with the mount?

@darthShadow
Copy link
Member

This is with the newly revamped full mode.

@ncw
Copy link
Member Author

ncw commented Sep 1, 2020

Do you think we should be re-enabling this? Your speed tests don't look terribly encouraging. I wonder if the http2 backend can be persuaded to open more connections as I suspect it just opens one which is being swamped by the upload.

How many connections did rclone have open - can you check with netstat -tuanp?

@darthShadow
Copy link
Member

I am not sure whether this is an issue with the Google servers or some OS tuning. Preferably someone else can also chime in with their test results (with multiple transfers) to see how it's working for them.

I would prefer not to enable it by default unless other folks report in with encouraging results.

netstat -tuanp

Will post results within a day or so once I have some downtime to restart the mount and enable HTTP/2 again.

@darthShadow
Copy link
Member

So, tried again on a different server with rclone copy directly to the remote with 4 transfers:

Without HTTP/2: 4 Connections (on netstat) & Total Speed: 320 MB/s
With HTTP/2: 1 Connection (on netstat) & Total Speed: 160 MB/s

This is a drastic difference, so we should probably no enable it by default for at least this release and continue evaluating it in the future to see if it ever improves.

@ncw
Copy link
Member Author

ncw commented Sep 2, 2020

@darthShadow thanks for testing. That is a very big difference!

I found a go bug which looks relevant here: golang/go#37373

I added a comment there.

@darthShadow
Copy link
Member

Excellent, thanks. Will keep an eye on that issue.

@ncw ncw modified the milestones: v1.53, v1.54 Sep 5, 2020
@ncw ncw modified the milestones: v1.54, v1.55 Feb 3, 2021
@ncw ncw modified the milestones: v1.55, v1.56 Apr 3, 2021
@ncw ncw modified the milestones: v1.56, v1.57 Jul 20, 2021
@ncw ncw modified the milestones: v1.57, Known Problem Nov 1, 2021
@solracsf
Copy link
Contributor

Can this help here? golang/go#37373 (comment)

@ncw
Copy link
Member Author

ncw commented Feb 4, 2022

Can this help here? golang/go#37373 (comment)

Disabling keepalives will kill the performance of the short transactions, like listing directories which I think would make it unusable.

If we could disable keepalives for the large transactions only that would work. I can't think of an easy way of doing that but it might be possible...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants