Skip to content

persist the binary cache index in the misc cache and expose remote specs to spack spec#21538

Closed
cosmicexplorer wants to merge 9 commits intospack:developfrom
cosmicexplorer:hash-cache
Closed

persist the binary cache index in the misc cache and expose remote specs to spack spec#21538
cosmicexplorer wants to merge 9 commits intospack:developfrom
cosmicexplorer:hash-cache

Conversation

@cosmicexplorer
Copy link
Copy Markdown
Contributor

@cosmicexplorer cosmicexplorer commented Feb 8, 2021

Description

Closes #19085. Intends to solve the same problem.

Changes

  1. Add the test which uses spack buildcache create from buildcache list: add listed specs to database as not installed #19085.

The PR has been split into several independent commits, with all of the work below in the final commit:

  1. Make spack buildcache list use the directory ~/.spack/caches/indices (using config:misc_cache as the root) to persist the BinaryCacheIndex.
  2. Create spec_index.py:
    1. Move InstallRecord from database.py.
    2. Create IndexQuery class which performs the logic of Database._query().
    3. Create a singleton SpecIndex to expose only local, only remote, or both local and remote specs for querying.
  3. Make spack spec, spack find, and spack install concretize specs by referring to the appropriate SpecIndex instance.

Result

The following works:

> spack buildcache list -l  # note hash for desired package is abcdefg 
> spack install /abcdefg    # if /abcdefg is from a remote buildcache

@cosmicexplorer cosmicexplorer force-pushed the hash-cache branch 3 times, most recently from 5d3d83b to 139c1dc Compare February 11, 2021 23:14
@cosmicexplorer cosmicexplorer force-pushed the hash-cache branch 5 times, most recently from 2567de6 to 3a24d33 Compare February 17, 2021 10:35
@cosmicexplorer cosmicexplorer force-pushed the hash-cache branch 5 times, most recently from 3a8d033 to 793e196 Compare February 25, 2021 05:30
@cosmicexplorer cosmicexplorer changed the title persist the binary cache index in the misc cache and update local db with remote entries persist the binary cache index in the misc cache and expose remote specs to spack spec Feb 25, 2021
@cosmicexplorer
Copy link
Copy Markdown
Contributor Author

@tgamblin This should be ready for review. The PR has been split into 4 independent commits, which makes it very easy for me to break out parts of it into pre-work PRs. I believe it should be clear upon inspection that the final commit is the only one that changes behavior, but it is very easy for me to split this up if needed.

Note that (as in the TODO section) that spack buildcache list and spack install --cache-only are not currently hooked up so that they work in the test. I am going to fix this before merging (there is a commented-out section of the new test case which uses them).

@cosmicexplorer
Copy link
Copy Markdown
Contributor Author

Ok, the test should finally be passing now! 🎉

@cosmicexplorer cosmicexplorer force-pushed the hash-cache branch 3 times, most recently from 5021164 to 186d53b Compare February 25, 2021 19:41
@cosmicexplorer
Copy link
Copy Markdown
Contributor Author

cosmicexplorer commented Feb 25, 2021

Re @tgamblin @tldahlgren -- the workflow that this enables is also where it currently might cause a performance regression: in particular, when a hash prefix is used on the command line, spack will now unconditionally search all mirrors for that hash when parsing the spec. This is noted in two places:

  1. spack/lib/spack/spack/spec.py

    Lines 4729 to 4735 in 186d53b

    import spack.spec_index
    key = spack.spec_index.HashPrefix(dag_hash)
    # NB: This will reach out to all mirrors to update spack's BinaryCacheIndex,
    # whenever a hash is specified. It will also iterate over all of the entries in
    # the remote dbs attempting to match the hash prefix.
    spec_index = spack.spec_index.merged_local_remote_spec_index
    return spec_index.lookup_ensuring_single_match(key).concretized_spec.spec
  2. @llnl.util.lang.memoized
    def spec_index_lookup(self, hash_prefix):
    # type: (HashPrefix) -> List[IndexEntry]
    assert isinstance(hash_prefix, HashPrefix), hash_prefix
    # TODO: generate some sort of trie mapping when `self.update()` is called, to
    # avoid iterating over every entry here.
    return [
    entry
    for chash, entry in self._index_entries().items()
    if chash in hash_prefix
    ]

Let me know if this is worth further investigation here. The perf regression would be in the case of spack install asdf/xyz, where xyz is the hash prefix. We added the ability for xyz to be in a remote db, but now we unconditionally search the remote db.

I think it would probably be good to ensure that --no-cache and --cache-only won't search both local and remote databases unconditionally. I'll try to implement that so users have a way to avoid taking the performance hit.

@cosmicexplorer cosmicexplorer force-pushed the hash-cache branch 2 times, most recently from 7b9253d to ab0f227 Compare February 26, 2021 07:03
@cosmicexplorer
Copy link
Copy Markdown
Contributor Author

cosmicexplorer commented Mar 11, 2021

I still need to check why that happens.

I am pretty sure it was because of an unrelated change I made to BoolValuedVariant which I have since reverted. It was also causing spack spec mpileaks%gcc^[email protected] to fail.

@cosmicexplorer cosmicexplorer force-pushed the hash-cache branch 7 times, most recently from fe1afde to 3c401eb Compare March 17, 2021 06:22
- move misc_cache_location() out of the misc cache singleton
- ensure binary indices are instantiated from within the misc cache
- add remote buildcache entries to local db and update them every time the buildcache is fetched
- factor out _unverified_context() for python 2.6
- move mirror_directory_def into conftest.py

SpecIndex can now be instantiated and run successfully

- now to integrate with `spack spec`

make spack spec work

polish @named_tuple()

tmp

Revert "polish @named_tuple()"

This reverts commit 034e64e066c59094842ac28df5b9fec4e164a0d7.

all named_tuple

tmp

introduce Timestamp, TimeRange, IndexEntry, and IndexQuery

tmp

make more things use the new spec_index

ensure buildcache list reads from the spec index

make spack find search remote indices

add back all the changes that are in other PRs

`spack find -m -L --allarch xz` works

all of the changes i've made today. the test fails to find compilers

remove the two .pyi files, and most of the lang.py edits

make mypy pass, and remove circular import workarounds

add @gen_str_repr decorator

make `find` and `buildcache list` work again

make `find` and `buildcache list` work

make test pass!!!

make it possible to run spack on 2.6 again

make mypy and flake8 pass

undo incorrect changes

undo lang.py changes

undo changes to add types everywhere

fix buildcache list

clean up the spec by hash lookup

mypy and vermin checks

undo concretizer.lp change

undo typing-only changes

fix

tmp

fix variant parsing?

fix failure in BoolValuedVariant

unnecessary changes

tmp cmp

make tests pass (?)

add additional assertions

add callable annotations

fix types

novm argparse

Revert "novm argparse"

This reverts commit d9a80de472043521a92013d89b950038ddfd801f.

ah

make `spack.util.py2` work like `typing`

undo tons of typing changes

tmp

undo typing tmp

tmp typing revert

revert modifications to lang.py

add types and fix tests and revert lang.py changes

test fixes

add __repr__() to everything in spec_index.py

make test_buildcache_create() pass

further test fixes

add types and fix tests and revert lang.py changes

ok

ok2
idk

idk2

make it work
Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first half of the review -- mostly about what probably shouldn't be in this PR... moving on to the parts that should :)

concretize = kwargs.get('concretize', False)
normalize = kwargs.get('normalize', False)
tests = kwargs.get('tests', False)
index_location = kwargs.get('index_location', None)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be exposed to cmd/__init__.py -- if the spec parser needs it, the spec parser should ask for it from the spec index.

if not self.archive_file:
raise NoArchiveFileError("Cannot call archive() before fetching.")

import spack.util.web as web_util
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this needed to avoid a circular import?

# Check and record the values that are not allowed
not_allowed_values = [
x for x in value
# TODO: avoid the multiple special-cases for '*' with a wrappper class.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this TODO doesn't have to do with this PR


from spack.error import SpackError
from spack.database import InstallStatuses
from spack.error import SpackError
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason to do this in this PR. Remove.


specs = spack.cmd.parse_specs(args.specs)
merged = spack.spec_index.IndexLocation.LOCAL_AND_REMOTE()
specs = spack.cmd.parse_specs(args.specs, index_location=merged)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR should not affect every call to parse_specs - remove this and make the spec parser know about the index. the index can learn about what potential places there are to get hashes by looking at spack's mirror configuration.

for sha256, package_to_patch in other.index.items():
p2p = self.index.setdefault(sha256, {})
p2p.update(package_to_patch)
self._update_index(other.index)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes seem very unrelated to the PR.

index_location = spack.spec_index.IndexLocation.LOCAL()
if isinstance(spec_like, six.string_types):
spec_list = SpecParser(self).parse(spec_like)
spec_list = SpecParser(self, index_location=index_location).parse(spec_like)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that we should change the whole spec constructor just for this -- I think the index location should be hidden from users of Spec.

query_spec = [query_spec]
recorded_spec = rec.spec
if for_all_architectures:
recorded_spec = recorded_spec.without_architecture_or_compiler()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--allarch is not a concern that we should be letting into database.py. It's really a UI concern, not a core database concern. If the user wants to see only some architectures, they should filter results outside the DB or we should make a better query object. We shouldn't have a specific for_all_architectures argument here.

if f.__doc__ is None:
f.__doc__ = ""
f.__doc__ += _query_docstring
return f
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cool but do we need it in this PR?

spec._add_dependency(child, dtypes)

def _read_from_file(self, filename):
def fill_from_file(self, filename):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the name changed here? I guess you needed to makem the json one public, and it makes sense to make this one public too. That's fine but I'd call them read_from_file and read_from_json.

Copy link
Copy Markdown
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok here are some more comments.

import spack.spec


class ConcretizedSpec(object):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear to me why we need this class. Is the existing Spec API bad for this? I don't see anything on here that we can't do with Spec.



class IndexEntry(object):
concretized_spec = None # type: ConcretizedSpec
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just be Spec


class IndexEntry(object):
concretized_spec = None # type: ConcretizedSpec
record = None # type: spack.database.InstallRecord
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd to me b/c the record should already have a spec.


@add_metaclass(ABCMeta)
class PartialHash(object):
FULL_HASH_STRING_LENGTH = 32 # type: ClassVar[int]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would rather not have this defined in multiple places.

pass


class ConcreteHash(CompleteHash):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the hash types here are losing me -- do we need them? I think hashes can be much simpler than this...


@add_metaclass(ABCMeta)
class SpecIndexable(object):
"""Interface to look up and traverse known specs from multiple sources."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to make interface types in Spack -- we tend to stay away from them.

pass


class SpecIndex(object):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now this is IMO overly complex -- it needs to be refactored to just take a list of DB's, and to look up hashes in those. Where the hashes come from should be someone else's concern.

the local DB is a DB, the remote DB is a DB, etc. -- they're all the same. I don't think we need separate methods or anything for each kind of DB. I was expecting that this would just look up hashes and map them to specs, from several DBs.

return '{0}(indices={1!r})'.format(type(self).__name__, self.indices)


class IndexLocation(object):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can get removed -- we shouldn't need to care about the index location.

"""Share a merged BinaryCacheIndex for remote buildcaches, in the user's homedir."""
cache_root = os.path.join(misc_cache_location(), 'indices')
cache_root = spack.util.path.canonicalize_path(cache_root)
return cache_root
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good.

raise InvalidIndexHash(expect_hash, locally_computed_hash,
'remote index at {0}'.format(mirror_url),
'this indicates an error in index tranmission')
# tty.error(msg_tmpl.format(locally_computed_hash, expect_hash))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is commented out

@cosmicexplorer
Copy link
Copy Markdown
Contributor Author

Going to break this out into multiple other PRs despite as well as because of the immense effort required to get it passing CI.

@cosmicexplorer
Copy link
Copy Markdown
Contributor Author

Reopening this so I don't forget about it.

@cosmicexplorer cosmicexplorer marked this pull request as draft February 20, 2022 04:21
@alalazo
Copy link
Copy Markdown
Member

alalazo commented Nov 26, 2024

Closing, since there has been no activity in a while. Feel free to reopen if you want to continue work on this.

@alalazo alalazo closed this Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants