Skip to content

Add multiprocessing to buildcache command#9440

Closed
obreitwi wants to merge 1 commit intospack:developfrom
electronicvisions:feature/buildcache_multiproc
Closed

Add multiprocessing to buildcache command#9440
obreitwi wants to merge 1 commit intospack:developfrom
electronicvisions:feature/buildcache_multiproc

Conversation

@obreitwi
Copy link
Copy Markdown
Member

@obreitwi obreitwi commented Oct 4, 2018

  • Add -w/--without-dependencies to suppress dependency resolution.

  • Add -j/--jobs to install/create several package from/to buildcache in parallel (defaults to 1).

  • Applied some flake8-fixes

  • Added (and compressed) tests

Reasoning:

The current buildcache implementation lacks the ability to install packages/create tarballs en bulk:

  • Dependencies are always checked for and included
  • the package index is updated after EVERY installed package instead of once after everything is installed

In order to reduce build time of our spack-based singularity container we want to pre-compute all hashes to be installed from buildcache in advance (i.e. we do not want spack to perform additional checks) and we want this installation to be possible with more than one core (decompressing is cpu-intensive and therefore not completely IO-bound).

The same goes for creating tarballs that are put into the buildcache after container creation.

Hence this commit introduces these two possibilities.

By default the behavior remains the same, i.e. spack fails after encountering its first error. When using several cores we gather all occurred errors and print them afterwards.

In theory parallelization could also be achieved by using tools such as xargs and spawning multiple python processes, but I think it would be nicer to have this possibility out of the box in spack (in the same way as you can install packages with multiple cores).

Note: We already use the proposed modifications without problems. I only got around to porting the modifications to the current develop-branch now and added some tests while I was at it.

Copy link
Copy Markdown
Member

@gartung gartung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as default behavior is identical to the behavior pre-change I am fine with this change.

@healther
Copy link
Copy Markdown
Contributor

healther commented Oct 5, 2018

@alalazo @scheibelp could you take a look at this?

* Add -w/--without-dependencies to suppress dependency resolution.

* Add -j/--jobs to install/create several package from/to buildcache in
  parallel (defaults to 1).

* Applied some flake8-fixes

* Added (and compressed) tests

Reasoning:

The current buildcache implementation lacks the ability to install
packages/create tarballs en bulk:

* Dependencies are *always* checked for and included
* the package index is updated after EVERY installed package instead of
  once after everything is installed

In order to reduce build time of our spack-based singularity container
we want to pre-compute all hashes to be installed from buildcache in
advance (i.e. we do not want spack to perform additional checks) and we
want this installation to be possible with more than one core
(decompressing is cpu-intensive and therefore not completely IO-bound).

The same goes for creating tarballs that are put into the buildcache
after container creation.

Hence this commit introduces these two possibilities.

By default the behavior remains the same, i.e. spack fails after
encountering its first error. When using several nodes we gather all
occurred errors and print them afterwards.

The only (minor) difference is that since it is not possible to raise
exceptions in multiprocessing.Pool-workers we have to convert all
encountered SpackErrors to strings and print them manually.

In theory parallelization could also be achieved by using tools such as
xargs and spawning multiple python processes, but I think it would be
nicer to have this possibility out of the box in spack (in the same way
as you can install packages with multiple cores).

Change-Id: I97b1c14bf7563ac9d68b904b0c38eeb08b8d210e
@obreitwi obreitwi force-pushed the feature/buildcache_multiproc branch from f351742 to eaa8ba4 Compare October 5, 2018 15:09
@obreitwi
Copy link
Copy Markdown
Member Author

obreitwi commented Oct 5, 2018

Interestingly, spack flake8 $(git diff --name-only HEAD~) did not show the style violations travis complained about…

Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @obreitwi I have a few notes/requests

  • Most importantly, I think this will conflict with #8718, which we are prioritizing, so I recommend waiting on that before syncing
  • I'm not sure why you are overriding Process to disable daemon functionality - could you explain why you're doing that?
  • Comments include a few suggestions for improving robustness

spec.dag_hash(7) +
' to install "%s"' %
spec.format())
spec.dag_hash(7) + ' to install "%s"' % spec.format())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as flake wasn't complaining before, I prefer that style changes be minimized (minimally it would be useful to separate out style changes into a separate commit)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did these changes in response to Travis-complaints but will separate them out into another commit if they should re-appear in the rewrite.


class NoDaemonProcess(Process):
"""Daemon processes are not allowed to create child processes.
Yet Sometimes that feature is needed."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to mention when it is needed - ideally whatever reason this is being used for in Spack (for example if there are many, just list the most relevant ones for its use in Spack).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my reply below. Will update the comment string in the rework.

def _get_daemon(self):
return False

def _set_daemon(self, value):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leery of preventing processes from becoming Daemons by overriding and treating this as a noop. Are you trying to prevent multiprocessing.Pool from doing this automatically? I didn't get that sense from skimming https://docs.python.org/2/library/multiprocessing.html

If somewhere in the Spack code is setting processes as daemons, I think it would be good to address that directly rather than intercepting and dropping the daemon requests at this level

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the other way around - daemon just means processes can't have children and are cleaned up along with their parent process (see below). We need the processes to have children for unpacking packages → daemon-flag needs to be set to False.

# currently, specs cause an infinite recursion bug when pickled
# -> as multiprocessing uses pickle internally, we need to transform
# specs prior to distributing the work via worker pool
specs = [s.to_dict() for s in specs]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll want to set all_deps=True for to_dict so that build deps are included in this

# Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
##############################################################################
import argparse
import copy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@obreitwi @scottwittenburg this and #8718 will likely conflict, I haven't yet determined how extensively. We are prioritizing #8718 because it is helping us do more extensive testing of building packages with Spack.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current PR can already not be trivially rebased on top of develop, so I'll wait until #8718 is merged before continuing here.

In principle, though, I don't see any reason why this PR could not be implemented on top of #8718 after quickly reading through the proposed changes.

args = ['create', '-d', mirror_path, "-j", jobs, str(spec)]
if not signed:
args.insert(-1, "-u")
elif not key_created:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears key_created may not be defined at this point (reading to the beginning of the function, there appear to be code paths where it isn't set)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not touch the gpg-key-handling with my PR in any way - this could be some side-effects from rebasing - for the rework I'll again rebase our working solution on to the current develop branch.


pool = NoDaemonPool(args.jobs)
retvals = pool.map(f_create, specs, chunksize=1)
# TODO: every f_create-call updates the index -> unnecessary
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still a TODO? The last two lines are phrased as though the TODO is handled.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, it is handled, will remove the TODO..

@obreitwi
Copy link
Copy Markdown
Member Author

Most importantly, I think this will conflict with #8718, which we are prioritizing, so I recommend waiting on that before syncing

Agreed! As far as I can tell I would also need to do some work when rebasing to the current develop-branch.

I'm not sure why you are overriding Process to disable daemon functionality - could you explain why you're doing that?

Quoting the documentation:

When a process exits, it attempts to terminate all of its daemonic child processes.

Note that a daemonic process is not allowed to create child processes. Otherwise a daemonic process would leave its children orphaned if it gets terminated when its parent process exits. Additionally, these are not Unix daemons or services, they are normal processes that will be terminated (and not joined) if non-daemonic processes have exited.

As described here, multiprocessing.Pool uses "daemonized" processes so that they get cleaned up when the parent process dies.
When performing buildcache-operations, however, spack spawns additional processes (to (un-)zip packages etc). Hence we need to disable the daemon flag for the workers to do their job. In the rework (after #8718) I'll make sure that the worker threads are properly join()ed. However, in practice that has not been an issue for us.

Comments include a few suggestions for improving robustness

Will answer them in place.

@obreitwi
Copy link
Copy Markdown
Member Author

Organisational question: Should I rebase this PR or open a new one for the new rebase/rework?

@scheibelp
Copy link
Copy Markdown
Member

Organisational question: Should I rebase this PR or open a new one for the new rebase/rework?

Both approaches are OK with me. If you think the changes will be significant, I think it would be reasonable to close this and then refer to it in a new PR - we've done that with many Spack PRs and I think it can be a useful way to refocus the conversation.

@gartung
Copy link
Copy Markdown
Member

gartung commented Mar 1, 2019

@obreitwi sorry for the slew of commits. I was trying to resolve conflicts using the web interface which led to all sorts of issues that needed to be fixed.

@obreitwi
Copy link
Copy Markdown
Member Author

obreitwi commented Mar 1, 2019

@obreitwi sorry for the slew of commits. I was trying to resolve conflicts using the web interface which led to all sorts of issues that needed to be fixed.

No problem. I was meaning to ask what the current state regarding these change is? It seems #8718 (or rather: #10274) is merged. Does that mean I can start with the rework or are you already in the process of rewriting these changes, @gartung ?

@gartung
Copy link
Copy Markdown
Member

gartung commented Mar 1, 2019

I just merged #10756
You can rebase on that or I can give it a try.

@gartung
Copy link
Copy Markdown
Member

gartung commented Mar 1, 2019

@obreitwi can you please reset the head of your branch to
eaa8ba4
and force push

@obreitwi obreitwi force-pushed the feature/buildcache_multiproc branch from 25aa30b to eaa8ba4 Compare March 1, 2019 14:29
@obreitwi
Copy link
Copy Markdown
Member Author

obreitwi commented Mar 1, 2019

@obreitwi can you please reset the head of your branch to
eaa8ba4
and force push

Done.

@obreitwi
Copy link
Copy Markdown
Member Author

obreitwi commented Mar 1, 2019

I just merged #10756
You can rebase on that or I can give it a try.

I don't mind either way. We recently discovered (after almost 6 months of using the feature) that installing a lot of packages with many processes takes a lot of memory and I have the suspicion that either there is a memory leak somewhere or one can improve memory usage drastically.

I would prefer concluding my investigation prior to doing the rework. But since this only occurs when installing ~400 packages at the same time I don't know if it is a show-stopper.

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Aug 19, 2020

I would prefer concluding my investigation prior to doing the rework. But since this only occurs when installing ~400 packages at the same time I don't know if it is a show-stopper.

Can you update on the state of this PR?

@obreitwi
Copy link
Copy Markdown
Member Author

Can you update on the state of this PR?

It is effectively dead, as even with multiprocessing processing all files in Python to apply RPATH-related transformations was much too slow for us. As we do not need to take care of RPATHs in our setup, we have since switched to a simple chain of plain old unix tools to implement caching.

@obreitwi obreitwi closed this Aug 19, 2020
@alalazo
Copy link
Copy Markdown
Member

alalazo commented Aug 19, 2020

@obreitwi Thanks for getting back so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants