Skip to content

feat(atc): add data manipulation functions to load_var#5936

Closed
evanchaoli wants to merge 1 commit intoconcourse:masterfrom
evanchaoli:load_var_add
Closed

feat(atc): add data manipulation functions to load_var#5936
evanchaoli wants to merge 1 commit intoconcourse:masterfrom
evanchaoli:load_var_add

Conversation

@evanchaoli
Copy link
Contributor

What does this PR accomplish?

The load_var step make a pipeline to be able to load variables at run-time. But a new problem I have seen is that, sometime, users need to some simple manipulations on loaded vars, then they have to add a task to do so, which makes pipeline massive.

This PR adds some data manipulation functions to load_var. See my test pipeline:

var_sources:
- name: vs
  type: dummy
  config:
    vars:
      vs_var: hello

jobs:
- name: myjob
  plan:
  - task: generate-data
    config:
      platform: linux
      image_resource:
        type: registry-image
        source: 
          repository: busybox
      outputs:
      - name: out
      run:
        path: sh
        args:
        - -exc
        - |
         echo '{"foo": "bar", "arr": ["1", "2", "3"]}' > out/data.json
         echo '<release-note>some feature released</release-note>' > out/release.txt

  - load_var: data
    file: out/data.json
    adds:
    - name: statement
      expr: foo + " is bar" # basic string concat, where foo is a field in loaded var "data"
    - name: arrlen
      expr: len(arr)  # get length of an array, where arr is a field in loaded var "data"
    - name: from_vs
      expr: |
        "((vs:vs_var))" + " world" # vars from var_source can also be used

  - load_var: release
    file: out/release.txt
    adds:
    - name: note
      expr: rematch( "((.:release))", "<release-note>(.*)</release-note>" ) # regexp match, this help extract sub-string, e.g. extract release-note from PR description

  - task: show-vars
    config:
      platform: linux
      image_resource:
        type: registry-image
        source: 
          repository: busybox
      outputs:
      - name: out
      run:
        path: sh
        args:
        - -exc
        - |
         echo __((.:data.foo))__
         echo __((.:data-statement))__
         echo The array has ((.:data-arrlen)) items__
         echo __((.:data-from_vs))__
         echo '__((.:release-note))__'

And see screen shot of a build of the pipeline:

load_var_add

Changes proposed by this PR:

Add a param adds to load_var step, an add contains a name and a expression, where the expression will be evaluated to a string, and inject a new var <load_var name>-<add name>.

Notes to reviewer:

  1. This is an initial version for review. If people buy in this idea, more data manipulation functions can be supported.
  2. The package eval added in this PR can be used for Step "gate" rfcs#66

Contributor Checklist

Reviewer Checklist

  • Code reviewed
  • Tests reviewed
  • Documentation reviewed
  • Release notes reviewed
  • PR acceptance performed
  • New config flags added? Ensure that they are added to the
    BOSH and
    Helm packaging; otherwise, ignored for
    the integration
    tests

    (for example, if they are Garden configs that are not displayed in the
    --help text).

@vito
Copy link
Member

vito commented Jul 31, 2020

On the one hand, I think this is really useful, but on the other, I really don't think it should be built in to Concourse.

My instinct is to prevent Concourse from embedding another language within YAML:

  • It introduces a whole new depth to the question of pipeline portability - the gval package may evolve at a faster rate than Concourse build plans, once v10 starts to crystallize, so you would have to be more mindful of which Concourse version your pipeline requires.
  • Introducing another language, albeit a small one, increases Concourse's already somewhat steep learning curve - I love jq, but I still have to look up the manual every other time I use it.
  • It increases the complexity of interpreting and validating a Concourse pipeline - should someone want to implement a better Concourse someday that supports existing pipelines, will they have to use this gval package, write it in Go, or write an equivalent implementation in their language?

Given these concerns, I think it would be better off if this were implemented as a Prototype. Then it can be versioned with the pipeline itself, it won't be "required learning" for Concourse, and users can use whatever language they like - there could be a jq prototype, a Lua prototype, or whatever.

I don't know exactly how this would look, but it should be easy to support given that Prototypes can consume arbitrary values and return arbitrary objects. Perhaps the run step could take an set_var: field or something to set the returned object as a var:

plan:
- load_var: data
  file: out/data.json
- run: eval
  type: gval
  set_var: extended_data # set returned object fields as `extended_data` local var
  params:
    input: ((.:data))
    fields:
      statement: foo + " is bar"
      arrlen: len(arr)
      from_vs: |
        ((vs:vs_var)) + " world"
- task: show-vars
  config:
    platform: linux
    image_resource:
      type: registry-image
      source: 
        repository: busybox
    outputs:
    - name: out
    run:
      path: sh
      args:
      - -exc
      - |
       echo __((.:data.foo))__
       echo __((.:extended_data.statement))__
       echo The array has ((.:extended_data.arrlen)) items__
       echo __((.:extended_data.from_vs))__

I fully recognize that this involves more overhead (running a container), but at least does not require users to write a bunch of one-off tasks, which I think is the more important goal.

One benefit from this form is that it allows for using complex values from multiple vars, e.g.:

- run: eval
  type: gval
  set_var: extended_data # set returned object fields as `extended_data` local var
  params:
    input:
      from_local: ((.:data))
      from_vs: ((vs:vs_var))
    fields:
      statement: from_local.foo + " is bar"
      arrlen: len(from_local.arr)
      from_vs: from_vs + " world"

@evanchaoli
Copy link
Contributor Author

@vito I have no intention to embed some other language into Concourse, instead, I want to introduce a Concourse own evaluation language (maybe the language itself deserves a RFC), which we can start with some basic arithmetic, string operation, regexp match, etc, where gval is just a lib we can leverage to implement the language. This Concourse native language could be used in multiple places: load_var, gating (concourse/rfcs#66), and maybe dynamic across step, and so on ...

I couldn't pick up a word to accurately describe my feeling with the run step you suggested, maybe "hesitate". I think it just makes pipeline more complicated, harder to read, because embedding task into run implies multiple levels of embedding, introducing concept of variable scopes, so on so forth. But my initial motivation was to simplify pipeline, making pipeline YAML shorter.

From my own experience of building pipelines, I am always struggling with some tiny things: add a task to do some simple data manipulation? write the task directly in pipeline YAML or put the task to a file? which language to use, bash, python, ruby, or golang? which task image to use? Those are all tiny problems, but they really burden users.

So it would be wonderful if Concourse can provide as more supportive functions as possible, so that user may only focus on their own logic, like how to run tests, how to package a product, etc.

@vito
Copy link
Member

vito commented Aug 5, 2020

@vito I have no intention to embed some other language into Concourse, instead, I want to introduce a Concourse own evaluation language (maybe the language itself deserves a RFC), which we can start with some basic arithmetic, string operation, regexp match, etc, where gval is just a lib we can leverage to implement the language. This Concourse native language could be used in multiple places: load_var, gating (concourse/rfcs#66), and maybe dynamic across step, and so on ...

My hesitation and the drawbacks I listed don't stem from embedding some other language, they stem from embedding a language at all. It may not be the gval package, it may be our own, but the same challenges exist.

I couldn't pick up a word to accurately describe my feeling with the run step you suggested, maybe "hesitate". I think it just makes pipeline more complicated, harder to read, because embedding task into run implies multiple levels of embedding, introducing concept of variable scopes, so on so forth. But my initial motivation was to simplify pipeline, making pipeline YAML shorter.

Sorry, that's a syntax error - I didn't mean to embed it, I accidentally indented the task more than needed when I copy-pasted it from your example. This is supposed to just be just a run step followed by a task step. I've updated the previous comment; it should look something like this (using another run step with a make-believe debug prototype instead of a task):

- run: eval
  type: some-type
  set_var: some-var
  params: {fields: {some-field: hello}}
- run: echo
  type: debug
  params: {message: ((.:some-var.some-field))}

The variable scoping would work the same way as load_var, only the vars would be coming from a prototype's response instead of a file.

From my own experience of building pipelines, I am always struggling with some tiny things: add a task to do some simple data manipulation? write the task directly in pipeline YAML or put the task to a file? which language to use, bash, python, ruby, or golang? which task image to use? Those are all tiny problems, but they really burden users.

I want the answer to these questions to always be "use a prototype", and for there to be common prototypes implemented an shared within the community so that users can just use one from the catalog just as they already do with resource types. If a prototype is not available to do the job, I want it to be easy to write one and share it - only then will you have to decide which language to implement it in. From then on, everyone can just use it.

I think we get to this future world by doubling down on the idea as much as possible and seeing where it fits; putting more and more logic into Concourse itself increases the complexity of Concourse and can stunt the growth of this ecosystem.

@evanchaoli
Copy link
Contributor Author

evanchaoli commented Aug 6, 2020

@vito I think you have been talking about prototype long time back, and I just went over the RFC of prototype again. Please correct me if I am wrong. From my understanding, prototype is a generalization of resource type, which allows users to custom "resource type" more or less than check/get/put. Basically, prototype still runs as containers, inputs and outputs via files. So I'm confused how a prototype can inject vars into pipelines variables?

Let's still use the original sample:

 plan:
- load_var: data
  file: out/data.json
- run: eval
  type: gval
  set_var: extended_data # set returned object fields as `extended_data` local var
  params:
    input: ((.:data)) # Here, how data can be passed into `set_var`? As var interpolation supports only string replacement.
    fields:
      statement: foo + " is bar"
      arrlen: len(arr)
      from_vs: |
        ((vs:vs_var)) + " world"

- load_var: new_data # Is a separate load_var is needed to load output of the prototype into pipeline's variables?
  file: eval/extended_data

- task: show-vars
  config:
    outputs:
    - name: out
    run:
      path: sh
      args:
      - -exc
      - |
       echo __((.:data.foo))__

@aoldershaw
Copy link
Contributor

@evanchaoli I'm sure @vito can do a better job answering your questions, but I'll give it a go:

Basically, prototype still runs as containers, inputs and outputs via files. So I'm confused how a prototype can inject vars into pipelines variables?

Requests to prototypes can emit two different things: bits (outputs written to disk), and streams of objects (arbitrary JSON written to stdout). So, I think the set_var field would take the emitted object stream and set that as a variable of your choosing. The gval prototype would likely emit a single object in the stream that looks something like:

{
  "statement": "bar is bar",
  "arrlen": 3,
  ...
}

params:
input: ((.:data)) # Here, how data can be passed into set_var? As var interpolation supports only string replacement.

We'll likely need to rework var interpolation to support this. The reason it's string only right now (I believe) is that tasks can only take in params in the form of environment variables, whereas prototypes can take in arbitrary JSON objects.

  • load_var: new_data # Is a separate load_var is needed to load output of the prototype into pipeline's variables?
    file: eval/extended_data

Nope! The gval prototype wouldn't likely emit any bits (so wouldn't have any outputs) - but as I mentioned before, it can still emit objects over stdout

@vito
Copy link
Member

vito commented Aug 6, 2020

@aoldershaw you got it! Here are some further clarifications:

input: ((.:data)) # Here, how data can be passed into `set_var`? As var interpolation supports only string replacement.

One quick clarification: vars already support interpolation of full values. It is only the task step that forces the values under params: to be strings - get steps and put steps, for example, do not. The run step would similarly support arbitrary configuration, which is one of the big motivators for its introduction, as task steps are currently painful to re-use and share because they're so limited in how they can be parameterized.

So, this example would effectively take the value in the data var, whether it's a string or complex structure, and set it directly in the JSON payload under the input field of the object sent to the prototype.

* load_var: new_data # Is a separate load_var is needed to load output of the prototype into pipeline's variables?
  file: eval/extended_data

As @aoldershaw pointed out, this extra load_var step wouldn't be necessary; by running a prototype, we will have a JSON response to do whatever we want with - in this case, we can have the run step itself be capable of setting it as build-local vars, just as load_var does.

So the tl;dr is that Prototypes give us a generic way to not only pass files around, but rich JSON request-responses. For prototypes which implement resources, these JSON objects would represent versions of the resource, but in other contexts we can interpret it however we want - in this case, as arbitrary data to set in local vars.

@evanchaoli
Copy link
Contributor Author

evanchaoli commented Aug 7, 2020

One quick clarification: vars already support interpolation of full values. It is only the task step that forces the values under params: to be strings - get steps and put steps, for example, do not. The run step would similarly support arbitrary configuration, which is one of the big motivators for its introduction, as task steps are currently painful to re-use and share because they're so limited in how they can be parameterized.

@vito That sounds great. Has that been implemented or just planed? I cannot find any doc about that. Also from the code on master branch, I cannot see that, for example get step's param interpolation: https://github.com/concourse/concourse/blob/master/atc/exec/get_step.go#L127, I don't see how it supports full values.

So the tl;dr is that Prototypes give us a generic way to not only pass files around, but rich JSON request-responses. For prototypes which implement resources, these JSON objects would represent versions of the resource, but in other contexts we can interpret it however we want - in this case, as arbitrary data to set in local vars.

Also, when prototype will be released?

@vito
Copy link
Member

vito commented Aug 10, 2020

@vito That sounds great. Has that been implemented or just planed? I cannot find any doc about that. Also from the code on master branch, I cannot see that, for example get step's param interpolation: https://github.com/concourse/concourse/blob/master/atc/exec/get_step.go#L127, I don't see how it supports full values.

It's implemented by the vars/ package. Note how Get returns interface{}, not string:

type Variables interface {
Get(VariableDefinition) (interface{}, bool, error)
List() ([]VariableDefinition, error)
}

Also, when prototype will be released?

It's planned for Q4 2020 at the moment. The RFC is done, so theoretically someone could start implementing it, but we're planning to do this gradually as it also will involve re-implementing our core resource types and we'd like to have a tight feedback loop there.

@evanchaoli evanchaoli closed this Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants