Skip to content

rebuild-index: fix race condition#39594

Merged
alalazo merged 2 commits intospack:developfrom
haampie:fix/race-condition
Aug 24, 2023
Merged

rebuild-index: fix race condition#39594
alalazo merged 2 commits intospack:developfrom
haampie:fix/race-condition

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Aug 23, 2023

See for example https://gitlab.spack.io/spack/spack/-/jobs/8029468

It looks like a race condition is hit somewhere down the line when constructing
spec objects, so just make it sequential again. Notice that awscli will still fetch
in parallel anyways.

@spackbot-app spackbot-app bot added binary-packages core PR affects Spack core functionality labels Aug 23, 2023
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Aug 23, 2023

Could you take this PR over @scottwittenburg? Looks like you're using a thread pool where you can't due to race conditions during package loading.

Also notice that parallel fetch that requires a Stage is process safe (fs lock) but not thread safe (fs lock bookkeeping). On top of that urllib is not thread safe either.

@haampie haampie added this to the v0.20.2 milestone Aug 23, 2023
@scottwittenburg
Copy link
Copy Markdown
Contributor

Yeah, I'll dig into this, but I'm a little unclear on a couple things.

How do you know there is a race condition during package loading? Is this message in the job you linked somehow indicative of that?

==> Warning: cannot reconstruct virtual dependencies on package renderproto: cannot load package 'renderproto' from the 'builtin' repository: directive "version" cannot be used within a "when" context since it does not support a "when=" argument

And a side question: is that actually resulting in a broken binary index?

Looks like you're using a thread pool where you can't due to race conditions during package loading.

I'm trying to understand this, but I'm not sure I get it yet. Is package loading triggered by creating the db? Can that just be done outside the threadpool?

Also notice that parallel fetch that requires a Stage is process safe (fs lock) but not thread safe (fs lock bookkeeping).

Is that comment somehow specific to this PR? I don't recall that fetching spec.json files to rebuild the binary index uses any Stage, and a quick check seems to confirm.

On top of that urllib is not thread safe either.

Yeah, google seems to confirm that's true, I wasn't aware of that.

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Aug 24, 2023

In this case it's probably SpecfileV1 / SpecfileV2 calling reconstruct_virtuals_on_edges, which calls Spec.virtual, which ultimately causes the package cache to be created, and that mutates globals (e.g. the with when: ... context manager is an obvious example).

Maybe the specific issue can be fixed by creating the package cache ahead of time, but I would recommend getting rid of ThreadPool, since Spack is almost nowhere thread safe (and neither is Python's standard library...).

And a side question: is that actually resulting in a broken binary index?

Maybe? I think it's taking old Spec formats and writing them in a new format, but it can't set new virtual attributes on edges.

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Aug 24, 2023

Use of ThreadPool was also pointed out in review btw: #32796 (review)

Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting this in. If we have performance drops we can act on them in following PRs

@alalazo alalazo merged commit 8914d26 into spack:develop Aug 24, 2023
@haampie haampie deleted the fix/race-condition branch August 24, 2023 06:31
dyokelson pushed a commit to dyokelson/spack that referenced this pull request Aug 24, 2023
mpokorny pushed a commit to mpokorny/spack that referenced this pull request Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binary-packages core PR affects Spack core functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants