Skip to content

Equivalent resources defined across pipelines and teams should only correspond to a single version history #2386

@vito

Description

@vito

What challenge are you facing?

Today, pipeline resource versions are collected in to a versioned_resources table. This table was predated the "life" epic (#629). It contains the following schema:

atc=# \d versioned_resources
                               Table "public.versioned_resources"
   Column    |  Type   | Collation | Nullable |                     Default                     
-------------+---------+-----------+----------+-------------------------------------------------
 id          | integer |           | not null | nextval('versioned_resources_id_seq'::regclass)
 version     | text    |           | not null | 
 metadata    | text    |           | not null | 
 type        | text    |           | not null | 
 enabled     | boolean |           | not null | true
 resource_id | integer |           |          | 
 check_order | integer |           | not null | 0

It points to resource_id, making this per-pipeline-resource. This means that multiple pipelines with the same resource configs will be redundantly collecting the same version/metadata information.

A Modest Proposal

To be honest, this isn't a huge deal right now, aside from wasted database space and redundant checking. However if we make the relationship between a pipeline's resources and the abstract version history a bit tighter, there are actually a few benefits:

  • We can reduce the amount of checking required across pipelines for equivalent resources.
  • We can reduce the amount of data recorded for equivalent resources.
  • When a user changes their pipeline resource's configuration, the history will be "re-set" (ref. Support for purging version history of a resource. #145) to the new config, and should always be correct.
  • There may be some as-yet-unknown improvements we can make to the database model by having a cleaner representation.
  • As part of RFC: Resources v2 rfcs#1, we're going to start collecting all versions, not just starting from pipeline configuration time. There'll be a lot more data to record, so sharing it between duplicated resources will make things a lot more efficient.

Implementation Notes

Enabling/Disabling versions

Enabling/disabling versions should remain scoped to pipeline resources, obviously. This can be done via a join table (pipeline_resource_config_versions or some such).

Distinct check intervals

Now that we only check once per resource config, there's a little gotcha. Different pipelines can have varying check_every settings.

Here's one idea: record last_checked on the resource config, and have each pipeline's radar component just check if the last_checked is >= their interval. So, we'll check at the fastest defined frequency. Pipelines with longer check_intervals will have versions show up more quickly than expected, but that really shouldn't matter.

Pausing pipeline resources

Currently, users pause pipeline resources with the intended effect that no new versions are collected and used for later builds. This is really awkward when other pipelines result in checking the config anyway.

We could still support today's behavior by "faking it" and having pausing a resource really just 'pin' it to whatever the version was at the time. But actually, that sounds a lot like #1288. Maybe we should just implement that instead, and remove the resource pausing functionality?

Metadata

Metadata

Assignees

No one assigned

    Labels

    acceptedefficiencyenhancementrelease/documentedDocumentation and release notes have been updated.size/largeA ton of work, possibly in multiple places. Mostly understood, but there may be unknown unknowns.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions