Skip to content

[RFC] Improve the CI docker image builder workflow #37855

@seemethere

Description

@seemethere

🚀 Feature

Motivation

The workflow to build docker images for CI today is a pain which involves

  1. Editing circleci configuration
  2. Reverting said configuration
  3. Copying the workflow ID from step 1
  4. Editing all related files to show updated tag as new workflow ID
  5. Adding workflow ID to ECR garbage collector so it doesn't clean it up
  6. Get things merged

Pitch

To make this easier I propose we take a two tiered approach:

  1. Calculate a hash of the docker image and all dependencies using a hashing algo (like sha256sum), hash should change if any files change
  2. Check if hash already exists within our docker repositories for said image using docker manifest inspect ${IMAGE}:${HASH}
  3. If hash exists noop, if it doesn't exist then build image
  4. Save hash to a env file that gets passed onto dependent jobs
  5. Have all jobs that use this image depend on the job that builds the image
Example script to calculate hash
#!/usr/bin/env bash

set -eou pipefail

HASH_DIR=${HASH_DIR:-docker}

find "${HASH_DIR}" -type f -exec sha256sum {} \; \
    | sort \
    | sha256sum \
    | cut -d' ' -f1

Additional notes

  • I feel like we should move off of ECR if at all possible, Docker hub allows us to not have to rely on ECR garbage collection
  • With this I also propose we should eventually think of moving images that are currently built in pytorch/builder to be built in this repo as well so that they can reap the benefits of this new docker builder workflow

cc @ezyang @seemethere @malfet

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: ciRelated to continuous integrationtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions