ssh: fix connection creation "leaking" connections #5816

ConcurrentCrab · 2024-06-30T23:20:58Z

Commit 44b8801 introduced a bug where connections were being created on-demand but were not being stored in the connections slice.

When trying to end connections at the end of a transfer, there were no handles to them. As a result, all but the primary connection of the session were hung with no "quit" message. LFS clients (and servers) were simply hung until idle timeout kicked in.

Found this while implementing pure SSH protocol for Gitea (go-gitea/gitea#17554).

This seems like important functionality, so I'd recommend adding some sort of pure SSH multiplexed tests in the suite as well.

Commit 44b8801 introduced a bug where connections were being created on-demand but were not being stored in the connections slice. When trying to end connections at the end of a transfer, there were no handles to them. As a result, all but the primary connection of the session were hung with no "quit" message. LFS clients were simply hung until idle timeout kicked in.

chrisd8088 · 2024-07-03T06:04:17Z

Hey, thanks for diagnosing the problem and contributing a fix! This looks obviously correct to me.

Once our CI is back in good shape I'll run the test suite.

I think your suggestion of adding some tests to validate this behaviour is a very good one. I don't want to impose on your time, but since you're working in this area, if you have any ideas or notes on how you simulated this problem, those would be very helpful. Ideally we'd take an existing test like the batch transfers with ssh endpoint (git-lfs-transfer) test and run another version of it which failed without your change, and succeeded with it.

ConcurrentCrab · 2024-07-03T20:33:56Z

@chrisd8088 the test case you linked looks like it was close to catching this. The trick is, multiplexing only kicks in when there are >1 LFS objects to upload (since the flow for a single file has to be a linear batch -> upload -> verify). I found this problem while trying uploads of multiple large files, but it's reliably reproducible even with just 2 tiny test files.

Just adding multiple files to the test commit would be enough as a test case. Unfortunately both known implementations of git-lfs-transfer treat a timeout EOF as a graceful exit, so it won't exactly fail (not sure how to do that), but a running time of 2 minutes should indicate that something is awry :P

I tried adding a copy of that test, only with multiple LFS files, but that seems to fail in a new, entirely unexpected way:

#    [...] trace git-lfs: pure SSH connection successful
#    [...] trace git-lfs: failed to spawn pure SSH connection: Unable to negotiate version with remote side (unable to read capabilities): EOF
#    Failed to connect to remote SSH server: expected "/run/user/[...]/sock-[...]/lfs.sock" to exist
#    [...] trace git-lfs: shutting down pure SSH connection

Since it mentioned a socket, I suspected this is another multiplexing issue (as it uses a control socket to coordinate between the workers). And sure enough, adding a: git config lfs.ssh.automultiplexing false before the push lets the test past the push, though it later seems to fail doing the git lfs fsck (? looks like it at least).

I tried looking more into it but I don't think I'm nearly familiar enough with your test harness to be able to figure out why the the git-lfs process isn't able to access that control socket.

chrisd8088 · 2024-07-04T15:17:52Z

I tried looking more into it but I don't think I'm nearly familiar enough with your test harness to be able to figure out why the the git-lfs process isn't able to access that control socket.

OK, thanks for all the info! I'll try taking a look at this over the next while to see if I can reproduce that and also craft a test to demonstrate the fix in this PR.

chrisd8088 · 2024-08-18T03:15:25Z

Just FYI, I haven't forgotten about this PR; I've been busy with $DAYJOB and haven't managed to finish getting the CI suite back into good shape, although that work is nearly ready. Once I've got that done I'll look at this PR next.

Thank you for you patience and your contribution!

ConcurrentCrab · 2024-08-18T04:50:50Z

It's cool, I knew you wouldn't forget :)

6543 · 2024-09-12T17:34:18Z

Any update?

chrisd8088 · 2024-09-12T18:36:41Z

Any update?

As I wrote above, I've been working on getting our CI suite back to 100% success, and I have tracked down all the bugs and fixed them, but not yet had time to create a series of PRs for those changes. I hope to have that done in the next couple of weeks. Once those are merged, we can run this PR and several others against those repaired tests, so we'll have valid CI results for Windows and macOS (these are both broken right now in CI, for different reasons).

I appreciate everyone's patience while we get our CI jobs into good shape again.

chrisd8088 · 2024-09-21T22:35:34Z

Just a quick update on the state of our CI suite: PR #5866 fixed the macOS jobs, PR #5868 should fix the Windows jobs, and PR #5872 will upgrade us to Go 1.23, which should fix the Build with specific Go jobs. With any luck, we'll be back to 100% success after those are merged, and which point we can start pulling in other contributions again.

Thanks very much for your patience!

chrisd8088

Thanks again for the bug report and patch! This is definitely a valuable fix.

FWIW, I spent some time this weekend refreshing my memory of the SSH connection code, and turned up a number of issues aside from this one. I've written up the most significant bug in #5880; it tracks with what you described in #5816 (comment), I think.

The upshot, though, is that it's not immediately possible (so far as I could tell) to write a nice test which would demonstrate the problem this PR addresses.

Since the problem is clear, though, and the fix looks entirely correct, I'll just merge this PR in as it stands and we'll try to craft a more thorough multiple-object batch SSH transfer test in subsequent PRs.

6543 · 2024-10-07T16:19:35Z

thanks for the fix! & thanks for the maintenance work!

ConcurrentCrab requested a review from a team as a code owner June 30, 2024 23:20

This was referenced Jun 30, 2024

Add pure SSH LFS support go-gitea/gitea#31516

Merged

Support LFS purely over SSH protocol go-gitea/gitea#17554

Closed

aymanbagabas approved these changes Jul 9, 2024

View reviewed changes

6543 approved these changes Sep 5, 2024

View reviewed changes

Merge branch 'main' into ssh-leak-fix

27396cc

chrisd8088 mentioned this pull request Oct 7, 2024

Crash during pure SSH object transfer with multiple objects #5880

Closed

chrisd8088 approved these changes Oct 7, 2024

View reviewed changes

chrisd8088 merged commit 2daf828 into git-lfs:main Oct 7, 2024

chrisd8088 mentioned this pull request Mar 20, 2025

git-lfs-transfer does not exit, blocking git push from completing #6017

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ssh: fix connection creation "leaking" connections #5816

ssh: fix connection creation "leaking" connections #5816

Uh oh!

ConcurrentCrab commented Jun 30, 2024 •

edited

Loading

Uh oh!

chrisd8088 commented Jul 3, 2024

Uh oh!

ConcurrentCrab commented Jul 3, 2024 •

edited

Loading

Uh oh!

chrisd8088 commented Jul 4, 2024

Uh oh!

chrisd8088 commented Aug 18, 2024

Uh oh!

ConcurrentCrab commented Aug 18, 2024

Uh oh!

6543 commented Sep 12, 2024

Uh oh!

chrisd8088 commented Sep 12, 2024 •

edited

Loading

Uh oh!

chrisd8088 commented Sep 21, 2024

Uh oh!

chrisd8088 left a comment •

edited

Loading

Uh oh!

6543 commented Oct 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ssh: fix connection creation "leaking" connections #5816

ssh: fix connection creation "leaking" connections #5816

Uh oh!

Conversation

ConcurrentCrab commented Jun 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisd8088 commented Jul 3, 2024

Uh oh!

ConcurrentCrab commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisd8088 commented Jul 4, 2024

Uh oh!

chrisd8088 commented Aug 18, 2024

Uh oh!

ConcurrentCrab commented Aug 18, 2024

Uh oh!

6543 commented Sep 12, 2024

Uh oh!

chrisd8088 commented Sep 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisd8088 commented Sep 21, 2024

Uh oh!

chrisd8088 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

6543 commented Oct 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ConcurrentCrab commented Jun 30, 2024 •

edited

Loading

ConcurrentCrab commented Jul 3, 2024 •

edited

Loading

chrisd8088 commented Sep 12, 2024 •

edited

Loading

chrisd8088 left a comment •

edited

Loading