Skip to content

Skip rebuilding installed packages#16724

Closed
tldahlgren wants to merge 4 commits intospack:developfrom
tldahlgren:features/skip-deps-when-installed
Closed

Skip rebuilding installed packages#16724
tldahlgren wants to merge 4 commits intospack:developfrom
tldahlgren:features/skip-deps-when-installed

Conversation

@tldahlgren
Copy link
Copy Markdown
Contributor

@tldahlgren tldahlgren commented May 19, 2020

(UPDATE: Was #16654 but renamed to fix typo in the branch name.)

This PR addresses an issue raised in slack where a package that is already installed but has uninstalled dependencies was having its dependencies re-installed even when those dependencies were not needed to use the package.

Todd recommended using pre- (versus post-) order traversals to prune installed dependencies when initializing the build queue so that has been included.

There is a third commit to replace try-finally blocks with attribute assignments that attempt to do the equivalent of monkeypatch but does not always work as expected. For example, at one point tests using canfail with initial installs with succeed set to False then run after setting succeed to True would fail with complaints that succeed was [still] False. Using the testing framework's monkeypatch seemed to solve this problem.

@tldahlgren tldahlgren self-assigned this May 19, 2020
@tldahlgren tldahlgren changed the title Snapshot of skipping installed packages and pre-order traversal Skip rebuilding installed packages May 19, 2020
@tldahlgren tldahlgren closed this May 21, 2020
@tldahlgren tldahlgren force-pushed the features/skip-deps-when-installed branch from 9fe161f to 20b3e41 Compare May 21, 2020 01:11
@tldahlgren tldahlgren reopened this May 28, 2020
@tldahlgren tldahlgren removed the WIP label May 28, 2020
@tldahlgren tldahlgren requested review from alalazo, scheibelp and tgamblin and removed request for alalazo May 28, 2020 17:21
@tldahlgren
Copy link
Copy Markdown
Contributor Author

@tgamblin @scheibelp RFR. The three commits "should" be sufficiently distinct to retain separately.

@scheibelp scheibelp self-assigned this May 28, 2020
@scheibelp
Copy link
Copy Markdown
Member

This PR addresses an issue raised in slack where a package that is already installed but has uninstalled dependencies was being re-installed even when those dependencies were not needed to use the package.

To make sure, say we have two packages X and Y where X depends on Y; X is installed, Y is not. Are you saying that in this case, that Spack would reinstall X? Or are you saying that Spack would attempt to install Y as part of installing X?

@tldahlgren
Copy link
Copy Markdown
Contributor Author

tldahlgren commented May 28, 2020

This PR addresses an issue raised in slack where a package that is already installed but has uninstalled dependencies was being re-installed even when those dependencies were not needed to use the package.

To make sure, say we have two packages X and Y where X depends on Y; X is installed, Y is not. Are you saying that in this case, that Spack would reinstall X? Or are you saying that Spack would attempt to install Y as part of installing X?

Given X has Y as a dependency with X installed and Y not installed, spack install X would result in no further action than to report X is installed. In other words, we would no longer attempt to install Y in this case.

Thanks for pointing out the wording issue. I have updated the wording in the description.

@scheibelp
Copy link
Copy Markdown
Member

To clarify: I think you are describing the behavior of Spack with this PR applied. In #16724 (comment) I was more curious what the behavior of Spack was before this PR: would X be reinstalled? Or was the problem that Y was being installed even though it "wasn't needed"?

@tldahlgren
Copy link
Copy Markdown
Contributor Author

To clarify: I think you are describing the behavior of Spack with this PR applied. In #16724 (comment) I was more curious what the behavior of Spack was before this PR: would X be reinstalled? Or was the problem that Y was being installed even though it "wasn't needed"?

Ah. IIRC -- I heard second hand -- the problem was Y was being installed when not needed.

@scheibelp
Copy link
Copy Markdown
Member

OK: I'll note that generally speaking we won't always want to skip the installation of Y, even if X is installed. This depends on whether

  • Y is a build dependency
  • X is an external

If Y is not a build dependency, then we probably still want to install it; the exception is if X is an external, in which case we assume that all dependencies are taken care of. Is that accounted for here?

@tldahlgren
Copy link
Copy Markdown
Contributor Author

tldahlgren commented May 28, 2020

OK: I'll note that generally speaking we won't always want to skip the installation of Y, even if X is installed. This depends on whether

* Y is a build dependency

* X is an external

If Y is not a build dependency, then we probably still want to install it; the exception is if X is an external, in which case we assume that all dependencies are taken care of. Is that accounted for here?

We were already skipping externals, though I would have to review to determine if the build was skipping the dependencies of externals. (That is an excellent point, thanks.)

Perhaps I misunderstood the change. @tgamblin What are your thoughts on this?

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to my concern at #16724 (comment) i have the following questions/requests. Let me know if any seem off-track.

# If Package.install is called after this point, it will fail
pkg.succeed = False
pkg.do_install()
monkeypatch.setattr(spack.package.Package, 'remove_prefix',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to use monkeypatch for remove_prefix - that is more in keeping with how methods are temporarily adjusted for other tests. I'm curious though which changes in this PR prevent setting pkg.succeed.

Copy link
Copy Markdown
Contributor Author

@tldahlgren tldahlgren Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a number of differences with the tests here and develop so removed the commit that led to these test changes and am re-running the tests.

self._update_installed(spec.package)
tty.debug('Assuming dependencies of {0} are also installed'
.format(spec.package.name))
for dep in spec.dependencies():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about a predicate function with side effects. Also if we have packages X, Y, Z such that

  • X depends on Z
  • Y depends on Z
  • X is installed
  • Y and Z are not installed

Then I think this would not be the right decision to mark Z as installed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the counter example. I will have to give this more thought.

# Traverse dependencies from the bottom-up so any that are flagged
# as installed can be readily removed
for dep in self.pkg.spec.traverse(root=False,
predicate=not_installed):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that adding this predicate would be sufficient for avoiding the installation of dependencies when a dependent is installed (with the caveat mentioned elsewhere that generally we may still have to install non-build deps). What is the purpose of additionally marking the dependencies as installed (i.e. in the above not_installed function)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the core issue spawning this work was the undesirable affect of having dependencies of installed packages being installed.

So flagging dependencies of installed packages as being installed should help ensure there is no attempt to install the dependencies.

# uninstalled dependencies. This is necessary at this point
# since bootstrap compilers may not be listed as dependents
# of packages.
task.flag_installed(self.installed)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the priority is the number of uninstalled dependents, then I'm concerned there may be queue ordering issues if we have to readjust the priority at this point: ideally after any task is completed, the priority of all related tasks should be updated at that time.

In other words, the priority of tasks in the queue should be updated when we enqueue/complete tasks, not when we retrieve them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow.

Queue ordering isn't solely dependent on the priority. BuildTasks are ordered by the tuple (# uninstalled dependencies, sequence), where sequence is a counter used to preserve the order in which the task is added (or re-added) to the queue.

Copy link
Copy Markdown
Contributor Author

@tldahlgren tldahlgren Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I believe the code is doing what you think it should in terms of build task priorities.

That is, the priority of the build task is the number of uninstalled dependencies at the time the task is added or re-queued plus a sequence number. The build task of a dependent package is re-queued when it is determined that one of its dependencies has been "installed". The sequence number is used to ensure build tasks with the same number of uninstalled dependencies are sequenced by the order in which they were added to the queue. Sequencing the build tasks helps support parallel builds by allowing a process to cycle through those that have no uninstalled dependencies are being built by other processes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to read through this code again to refresh my understanding and see if I still have this concern.

If the task is being requeued when a dependency is installed, I think that would address my concern (and if that is already happening then perhaps I missed it - could you point me to where this happens?). I wasn't sure that was happening though because it didn't look like any requeueing was happening when an installation was finished.

# Determine state of installation artifacts and adjust accordingly.
self._prepare_for_install(task, keep_prefix, keep_stage,
restage)
# Possession of a read lock is required/assumed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that a read lock was just acquired, IMO this comment is unnecessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants