Skip to content

Invalid hashes when running multiple Poetry installs simultaneously #5142

@hopper-signifyd

Description

@hopper-signifyd

^(please note that projects are in sibling directories named project_a and project_b)

Issue

When running multiple poetry installs simultaneously with a shared Poetry cache directory, the operation commonly fails with the infamous "invalid hashes" error. This is common in CI and monorepo environments. Here's a sample:

RuntimeError

  Invalid hashes (sha256:c7a7026632f45188f4a4548cc308c5c0683d9b8259da5cbfe0301f7527843eb4) for pandas (1.0.5) using archive pandas-1.0.5-cp36-cp36m-manylinux1_x86_64.whl. Expected one of
[omitted the other hashes for the sake of brevity]
sha256:faa42a78d1350b02a7d2f0dbe3c80791cf785663d6997891549d0f86dc49125e.

  at ~/.local/share/pypoetry/venv/lib/python3.9/site-packages/poetry/installation/executor.py:627 in _download_link
      623│                     )
      624│                 )
      625│
      626│             if archive_hashes.isdisjoint(hashes):
    → 627│                 raise RuntimeError(
      628│                     "Invalid hashes ({}) for {} using archive {}. Expected one of {}.".format(
      629│                         ", ".join(sorted(archive_hashes)),
      630│                         package,
      631│                         archive_path.name,

After receiving this error, if I run find . -name pandas-1.0.5-cp36-cp36m-manylinux1_x86_64.whl, and then run a checksum on the file, I usually get a SHA256 from the "expected" list in the error message. If I don't get an expected hash, it seems to be related to another poetry install process that's still running and downloading that artifact.

My current working theory is something like this:

  • Poetry install process A checks cache for an arbitrary package (say pandas, since that's what's in the example error above). The process get a cache miss and starts downloading pandas.
  • Poetry install process B tries to install pandas. It checks the cache and find's A's pandas. However, this download is incomplete, so when process B does the hash check, it's wrong.
  • Process B fails
  • Process A finishes the download, checks the cache and checksum and succeeds.
  • I manually check the SHA256 of the file and see that it's correct because Process A has finished and I, as a human, am inherently slower than a computer.

Is there a way we can fix this so that multiple Poetry projects with a common cache directory can safely run simultaneously on the same machine? My initial proposed solution is to simply update the download process to download artifacts directly to the system's temp directory and only copy them into the cache once the download is complete. That way, all processes either get a cache miss, or a cache hit with a correct checksum.

Thoughts on this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions