Skip to content

Comments

Add --from-template flag on CLI Project Initialization to setup template based project#1728

Merged
schnie merged 14 commits intomainfrom
add_template_flag
Nov 18, 2024
Merged

Add --from-template flag on CLI Project Initialization to setup template based project#1728
schnie merged 14 commits intomainfrom
add_template_flag

Conversation

@pritt20
Copy link
Contributor

@pritt20 pritt20 commented Oct 8, 2024

Description

This PR includes changes to add --from-template flag to astro dev init command. This option will allow users to specify runtime templates name like etl, dbt-on-astro, generative-ai etc and CLI will setup an astro project based on provided template by user.

These changes are extension to the enhancements being made in onboarding-workflow where once user has completed the process of creating the deployment and now wants to make changes to deployed dags or deployment, we want them to explore astro-cli option.

These changes will make it easy for users to modify/update dags as the template based project would setup all the required files which were auto-deployed during deployment create process as part of onboarding-workflow.

🎟 Issue(s)

Related https://github.com/astronomer/astro/issues/24343

🧪 Functional Testing

Tested the change on local machine:

Template: generative-ai

astro dev init --from-template=generative-ai
Initializing Astro project
Pulling Airflow development files from Astro Runtime 12.2.0
Initialized empty Astro project in /Users/pritt/Astronomer/plg/astro-cli/airflow-ai
cat requirements.txt                                                                   
# Astro Runtime includes the following pre-installed providers packages: https://docs.astronomer.io/astro/runtime-image-architecture#provider-packages
duckdb==0.10.3
tabulate==0.9.0
sentence-transformers==3.0.1
cd dags && ls                                                                             
example_vector_embeddings.py

Template: etl

astro dev init --from-template=etl          
Initializing Astro project
Pulling Airflow development files from Astro Runtime 12.2.0
Initialized empty Astro project in /Users/pritt/Astronomer/plg/astro-cli/airflow-etl

cat requirements.txt                                                                     
# Astro Runtime includes the following pre-installed providers packages: https://docs.astronomer.io/astro/runtime-image-architecture#provider-packages
duckdb==0.10.3
tabulate==0.9.0% 
cd dags && ls                                                                            
example_etl_galaxies.py

Default Project:

astro dev init
Initializing Astro project
Pulling Airflow development files from Astro Runtime 12.2.0
Initialized empty Astro project in /Users/pritt/Astronomer/plg/astro-cli/test
cat requirements.txt                                                               
# Astro Runtime includes the following pre-installed providers packages: https://www.astronomer.io/docs/astro/runtime-image-architecture#provider-packages
cd dags && ls                                                                         
exampledag.py

Interactive UX test:

Template: dbt-on-astro

astro dev init --from-template
 #     TEMPLATE             
 1     dbt-on-astro         
 2     learning-airflow     
 3     generative-ai        
 4     etl                  

> 1
Initializing Astro project
Pulling Airflow development files from Astro Runtime 12.2.0
Initialized empty Astro project in /Users/pritt/Astronomer/plg/astro-cli/airflow-dbt
cat requirements.txt
astronomer-cosmos==1.6.0
cd dags && ls                               
dbt_cosmos_dag.py

Template: learning-airflow

astro dev init --from-template
 #     TEMPLATE             
 1     dbt-on-astro         
 2     learning-airflow     
 3     generative-ai        
 4     etl                  

> 2
Initializing Astro project
Pulling Airflow development files from Astro Runtime 12.2.0
Initialized empty Astro project in /Users/pritt/Astronomer/plg/astro-cli/airflow-la
cd dags && ls
example_astronauts.py

📸 Screenshots

Add screenshots to illustrate the validity of these changes.

📋 Checklist

  • Rebased from the main (or release if patching) branch (before testing)
  • Ran make test before taking out of draft
  • Ran make lint before taking out of draft
  • Added/updated applicable tests
  • Tested against Astro-API (if necessary).
  • Tested against Houston-API and Astronomer (if necessary).
  • Communicated to/tagged owners of respective clients potentially impacted by these changes.
  • Updated any related documentation

@pritt20 pritt20 marked this pull request as ready for review October 8, 2024 18:04
Copy link
Contributor

@lzdanski lzdanski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some grammar/spelling nits.

Copy link
Contributor

@neel-astro neel-astro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, but great work for a first time 🚀

if err := initDirs(path, dirs); err != nil {
return errors.Wrap(err, "failed to create project directories")
}
if template != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would the custom runtime version work with this new template logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template based astro projects will always have latest runtime version as we are fetching these files from astronomer/templates repo. I have included a note in description that airflow-version flag will not be respected when using --template flag.

@@ -0,0 +1,200 @@
package runtimetemplateclient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this warrants a top-level "client" for the simple task of cloning a repo.

I'd suggest a different structure to this:

  • Add a Clone function to pkg/git/git.go to use git to clone a given repo to a given path. We should not need to be doing all this low-level tar/gzip work, and git is already expected to be installed
  • Add a new file airflow/templates.go with:
    • An interface (e.g. gitClient) with the signature for Clone
    • A struct that implements the interface by simply calling the pkg function
    • A top-level variable that is instantiated with the struct
    • A function (e.g. initFromTemplate) that uses the variable to clone the templates repo to a temp dir and moves the specific template subdirectory to a given path
  • In Init call the above function with the template name
  • In the Init tests replace the top-level variable with a mocked struct that doesn't do a real clone but instead inserts the required dirs/files for the rest to work

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something @pritt20 and I discussed. As far as I know, we don't currently list git as a prerequisite. In this particular instance, we're trying to get the user up and running with the least amount of dependencies and prerequisites as possible. Even though most of our user base will probably have git installed, it just felt too risky to rely on for such a critical onboarding step where we're doing everything we can to optimize user experience and remove friction.

Are there other places in the CLI code where git is currently a hard dependency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is true, we do currently use git in a few places but not as a hard dependency. If we go with the tar/gzip approach can we follow a similar structure to what I listed above, so we do not need a new top-level client? Where a top-level variable can be replaced by a unit test to simulate the file download. We do this in a few other places, e.g. for deploying an image which a unit test cannot really do.

I'm also hoping we can extract the contents straight into the target dir instead of into a temp dir and then copying the dirs and files across?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @jeremybeard and @schnie.

I have updated the code to reflect above changes.

I tried to dump the extracted files to target directory but the tarball that is extracted contains all templates directories and use path like target-directory/astronomer-templates-7a8aa17/<templates-directories>. We want the selected template directory files to be in target-directory without any sub directories. Therefore I have kept the temp directory logic intact to move files around and copy only required files to target directory 😅

cmd/airflow.go Outdated
errPytestArgs = errors.New("you can only pass one pytest file or directory")
buildSecrets = []string{}
errNoCompose = errors.New("cannot use '--compose-file' without '--compose' flag")
templateVariations = []string{"etl", "dbt-on-astro", "learning-airflow", "generative-ai"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should embed the current list of template names in the CLI. If we do this then whenever we add/remove a template we will immediately need a new CLI release, and even then users will not always upgrade and so the validation behavior there will not be correct. Instead can we add another flag --list-templates that queries https://updates.astronomer.io/astronomer-templates and prints the list of available templates?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could makes sense - the less steps involved with adding a new template, the better. We could use that same list for validation right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jeremybeard! that makes sense.

Would it be okay if we make template options interactively available to users that they can choose from? I tried to implement this and named this flag as --select-template ( we can change this if we find better name). --select-template is boolean flag and if passed, user will be provided with all available template options that user can then select from and CLI will create selected template based project. Please find below example:

astro dev init --select-template
 #     TEMPLATE             
 1     dbt-on-astro         
 2     learning-airflow     
 3     generative-ai        
 4     etl                  

> 2

Initializing Astro project
Pulling Airflow development files from Astro Runtime 12.2.0
/Users/pritt/Astronomer/plg/astro-cli/airflow-test-2 
You are not in an empty directory. Are you sure you want to initialize a project? (y/n) y
Reinitialized existing Astro project in /Users/pritt/Astronomer/plg/astro-cli/airflow-test-2

}
for _, f := range files {
e := os.Remove(f)
e := os.RemoveAll(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way we can clean this up in one call? Seems very brittle to list these out individually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe create and run the tests in a temp dir, and delete the temp dir itself at the end of each test as part of TearDownSubTest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all tests to make use of temp directory (which will then be removed after tests are executed) instead of cmd directory

if err := initDirs(path, dirs); err != nil {
return errors.Wrap(err, "failed to create project directories")
}
if template != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This placement is mixed in with the existing non-template init flow, which is a bit confusing. Can we add the template != "" check at the top of the function and have it return early so we don't do unnecessary things like construct the files map?

"path/filepath"

"github.com/astronomer/astro-cli/pkg/fileutil"
"github.com/pkg/errors"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use "errors" please :), pkg/errors is not actively maintained any longer

Comment on lines 50 to 57
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("received non-200 status code: %d", resp.StatusCode)
}

body, err := io.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("failed to read response body: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try to read the response and add it to the error when we get back non 200 response code, just so that we are passing enough information to help the customer help us debug issues

cmd/airflow.go Outdated
}
cmd.Flags().StringVarP(&projectName, "name", "n", "", "Name of Astro project")
cmd.Flags().StringVarP(&airflowVersion, "airflow-version", "a", "", "Version of Airflow you want to create an Astro project with. If not specified, latest is assumed. You can change this version in your Dockerfile at any time.")
cmd.Flags().StringVarP(&fromTemplate, "from-template", "t", "", "Template name that you want to use to create the local astro project. Possible values can be etl, dbt-on-astro, generative-ai and learning-airflow. Please note template based astro project use latest runtime version and airflow-version flag will be ignored when creating a project with template flag")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe either we don't specify the option in the description if that is subject to change, or we call the airflow.FetchTemplateList as part of the newAirflowInitCmd itself so that the description could be dynamic based on response, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neel-astro ! I've updated the description of flag to remove the template names

}
for _, f := range files {
e := os.Remove(f)
e := os.RemoveAll(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe create and run the tests in a temp dir, and delete the temp dir itself at the end of each test as part of TearDownSubTest

cmd/airflow.go Outdated
cmd.Flags().StringVarP(&projectName, "name", "n", "", "Name of Astro project")
cmd.Flags().StringVarP(&airflowVersion, "airflow-version", "a", "", "Version of Airflow you want to create an Astro project with. If not specified, latest is assumed. You can change this version in your Dockerfile at any time.")
cmd.Flags().StringVarP(&fromTemplate, "from-template", "t", "", "Template name that you want to use to create the local astro project. Possible values can be etl, dbt-on-astro, generative-ai and learning-airflow. Please note template based astro project use latest runtime version and airflow-version flag will be ignored when creating a project with template flag")
cmd.Flags().BoolVarP(&selectTemplate, "select-template", "s", false, "Provides a list of templates to select from and create the local astro project based on selected template")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite like this UX, the user is presented with the valid choices and picks the one they want. Can we just make this the only way to create a project from a template? It could even be what --from-template does and then we don't need --select-template. We shouldn't need a non-interactive variant because this isn't going to be called by scripts/CI/etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jeremybeard ,

The initial Idea was to provide users with simple set of commands that they can execute to set up template based astro project. Eg: astro dev init --from-template etl

If we keep only --select-template flag functionality then we will have to guide users in our onboarding workflow to select the template and set up astro project accordingly.

CC: @schnie

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea the interactive mode is definitely awesome but I would still like to have the one-shot option available so we can compose it with a few other commands when we display them during onboarding. It'd be easier to copy/paste/execute the little block of commands without the interactive piece running. So in a way, we are kind of scripting with it here, its just not something a user would hook up in a CI pipeline or something.

I really wish we could just use the same flag and run interactive mode when the option is not provided but it seems like that's not really possible :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pritt20 we could use the same flag if we used the = format right?

So astro dev init --from-template would be interactive, and
astro dev init --from-template=etl would work for the one-shot version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this format can be implemented to use same flag for interactive and for one-shot version. Its just that we don't really use this format anywhere in CLI for existing commands😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a CLI UX perspective, I might be a bit concerned about changing the behavior of a single flag in the CLI, where a user could only pass a flag value with =.
From what I understand based on comments if passing a template value in the command isn't going to be a major form of usage (because the user has to copy/paste the value from somewhere), then should we just skip that and always present an interactive mode? . astro dev init does not have a CI-CD use case, so we don't necessarily need to support a non-interactive mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our next steps for the onboarding flow are to show the user a small block of commands to copy and execute in their terminal. I think we want a non-interactive way to do this to reduce the keystrokes (minor, I know) and steps involved to get setup. It'd be great to just paste in the terminal and go with no more interactions.

Given that, we have a choice to either piggyback on the same flag (--from-template), or add a separate, but similar command. The latter feels a little awkward and a little confusing. For that reason, I lean towards just using the = format for this command alone right now, but keep our existing pattern for other flags. We can treat this as a special case for onboarding purposes.

I imagine most users exploring the CLI on their own would run the command interactively. For our onboarding use-case we want to specify the exact template to use, so we'd provide them a command to run that matches the template they selected when going through the onboarding flow, like astro dev init --from-template=generative-ai.

Maybe we just document the bare, interactive --from-template mode and reserve the = format (--from-template=generative-ai just for the case where we give them the command to copy/paste.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we just document the bare, interactive --from-template mode and reserve the = format (--from-template=generative-ai just for the case where we give them the command to copy/paste.

That sounds good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our next steps for the onboarding flow are to show the user a small block of commands to copy and execute in their terminal.

Cool, I was missing this context, I am on board with that approach 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone! I've updated the PR to reflect above discussed changes.

tarReader := tar.NewReader(gr)
var baseDir string

limitTarReader := io.LimitReader(tarReader, maxExtractSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I initially tried without LimitReader and ran into vulnerability warning by go linter: G110: Potential DoS vulnerability via decompression bomb. Therefore modified the code to use LimitReader

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in control of the tar files so that doesn't apply in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to use tarReader and have flagged it with nolint comment.

}
defer out.Close()

if _, err = io.Copy(out, in); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've hoping we don't need to do this copy, but if we do I'd expect this should just be a rename not a literal copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not require this copy function anymore as we are now extracting the template files directly to user directory, hence cleaned up this code.

"github.com/stretchr/testify/assert"
)

func TestFetchTemplateList_Success(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the test suite like we have in other unit tests, even though there might not be much common between these tests, that would put it in the framework that we would enable others to use it easily if need be down the line.

Also, all of these could be sub-tests for a test function in that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that make sense! I've updated the tests to use test suit to keep these in line with existing tests.

})

s.Run("success", func() {
config.WorkingPath = s.tempDir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be part of setup, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neel-astro ! I've updated the tests now.


func (s *AirflowSuite) cleanUpInitFiles() {
s.T().Helper()
files := []string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing this 🙇‍♂️

@pritt20 pritt20 changed the title Add --template flag on CLI Project Initialization to setup template based project Add --from-template flag on CLI Project Initialization to setup template based project Oct 24, 2024
Copy link
Contributor

@neel-astro neel-astro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nitpicks to squash, but otherwise LGTM

Copy link
Contributor

@jeremybeard jeremybeard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for working through all the feedback. I left a nit comment but approving ahead of that

}

func SelectTemplate(templateList []string) (string, error) {
TemplatesTab := printutil.Table{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: templatesTab

@pritt20 pritt20 requested a review from lzdanski November 18, 2024 13:05
@schnie schnie merged commit ba53ff9 into main Nov 18, 2024
@schnie schnie deleted the add_template_flag branch November 18, 2024 13:54
@schnie
Copy link
Member

schnie commented Nov 18, 2024

Blocking feedback request was handled and approved async from Github. Administrative merge for that reason.

@lzdanski
Copy link
Contributor

lzdanski commented Nov 21, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants