Skip to content

Conversation

@epapineau
Copy link
Contributor

@epapineau epapineau commented Apr 12, 2022

closes: #22730

Enables dbt Cloud provider to make API calls to single tenant dbt Cloud instances using the tenant arg on DbtCloudRunJobOperator

Screenshots

dbt Cloud Single Tentant Instance Job + DAG

Line 31 of dbt_cloud_provider_example.py == default_args={"dbt_cloud_conn_id": "dbt_cloud", "tenant": "staging.singletenant"},

Screen Shot 2022-05-06 at 4 24 14 PM
Screen Shot 2022-05-06 at 4 24 19 PM

dbt Cloud Multi Tentant Instance Job + DAG

Line 31 of dbt_cloud_provider_example.py == default_args={"dbt_cloud_conn_id": "dbt_cloud"},
Screen Shot 2022-05-06 at 4 30 39 PM
Screen Shot 2022-05-06 at 4 31 16 PM


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg
Copy link

boring-cyborg bot commented Apr 12, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@sungchun12
Copy link

Keeping my eyeballs on this PR

@epapineau epapineau marked this pull request as ready for review April 26, 2022 21:24
@epapineau epapineau force-pushed the dbt_cloud_single_tenant branch from 2771d62 to 80b19c2 Compare April 26, 2022 21:28
@josh-fell josh-fell self-requested a review April 27, 2022 14:22
@epapineau
Copy link
Contributor Author

Hello~ Following up to see if this needs anything else for review? 🥲

@sungchun12
Copy link

@epapineau can you post a screenshot of this working on single tenant based on your code changes?

@epapineau
Copy link
Contributor Author

@sungchun12 Added to the PR. Please let me know if you'd like any others :)

@josh-fell
Copy link
Contributor

@epapineau How would users typically connect to these single-tenant instances? Does one API key have access to multiple, single-tenant instances or should one API only have access to an individual single-tenant instance?

The reason I ask is, if it's the latter, it might be better to have users provide a URL or tenant name in the Airflow connection rather than having to pass it in every operator/sensor instance.

@epapineau
Copy link
Contributor Author

@josh-fell one API key for one single tenant instance. A large customer may have multiple single tenant instances. Considering that, do you think it would be best to have the single tenant option on the connection, but leave it override-able at the operator/sensor, like with account_id?

@epapineau
Copy link
Contributor Author

@josh-fell Added tenant as an option to the Airflow connection. Let me know what you think.

@josh-fell
Copy link
Contributor

... one API key for one single tenant instance.

So it sounds like users would need to configure an Airflow connection per API key they use? Or is it possible that a single API key could access a single tenant instance + a multi-tenant instance therefore the need for an override when using a single Airflow connection?

@epapineau
Copy link
Contributor Author

@josh-fell It is not possible that a single API key could access a single tenant instance + a multi-tenant instance. Each instance will require its own API key

@josh-fell
Copy link
Contributor

@josh-fell It is not possible that a single API key could access a single tenant instance + a multi-tenant instance. Each instance will require its own API key

So if users will always need to use a separate connection for each API key they want to use because said key can't cross-pollinate with other instances, it seems like the move is to have tenant optionally configurable in the connection rather than overriding at the task level. If it's not populated in the connection, then it's assumed to be a multi-tenant instance which always has a "cloud" tenant name. Am I in the ballpark with that? Or what is a good use case to have the override at the task level?

@epapineau
Copy link
Contributor Author

@josh-fell Yep, totally follow. I removed the arg from the Operator. Do you have an opinion whether the tenant should be stored in the connection as the schema or as an extra?

@josh-fell
Copy link
Contributor

@josh-fell Yep, totally follow. I removed the arg from the Operator. Do you have an opinion whether the tenant should be stored in the connection as the schema or as an extra?

Schema is great I think.

@epapineau
Copy link
Contributor Author

@josh-fell okay changes pushed 🎉

Copy link
Contributor

@josh-fell josh-fell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the tenant parameter from the hook and sensor as well as update the unit test?

Comment on lines 132 to 137
def test_init_hook_single_tenant(self):
hook = DbtCloudHook(tenant="single.tenant")
assert hook.dbt_cloud_conn_id == "dbt_cloud_default"
assert hook.base_url == "https://single.tenant.getdbt.com/api/v2/accounts/"
assert hook.auth_type == TokenAuth
assert hook.method == "POST"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this test accordingly too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. I tried my hand at parameterizing this for an additional test connection. Please lmk what you think.

@epapineau epapineau requested a review from josh-fell May 26, 2022 04:48
Copy link
Contributor

@josh-fell josh-fell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're almost there! Just need to address the unit test failures.

@epapineau
Copy link
Contributor Author

@josh-fell I was able to resolve the unit test failures locally by updating the value of dbt_cloud_conn_id to dbt_cloud (the name of my connection locally). Should I push those changes?

@josh-fell
Copy link
Contributor

@josh-fell I was able to resolve the unit test failures locally by updating the value of dbt_cloud_conn_id to dbt_cloud (the name of my connection locally). Should I push those changes?

I don't think that will work for CI unfortunately. The test will be looking for a connection that's accessible by the CI environment. I can see a couple options to help resolve the failure:

  1. Add a Connection and db.merge_conn() in the setup_class() of TestDbtCloudJobRunSensor like what's in TestDbtCloudHook. This would ensure a connection is always available for the test. Most straightforward to resolve the failure.
  2. Remove the self.base_url logic from the constructor of the DbtCloudHook and implement a cached_property instead. This would make the hook constructor a little cleaner IMO, but would probably warrant a little bit of regression testing. The mock patch would skip over all of this logic and there would be no change to the test itself.
    @cached_property
    def base_url(self) -> str:
        tenant = self.connection.schema if self.connection.schema else "cloud"

        return f"https://{tenant}.getdbt.com/api/v2/accounts/"

I don't have a strong opinion either way.

@epapineau epapineau requested a review from josh-fell June 4, 2022 00:40
@epapineau
Copy link
Contributor Author

@josh-fell cool I went with option one & it's passing now 🥳

@josh-fell
Copy link
Contributor

Looks fantastic!! Would you mind rebasing to the latest main when you get a chance please? We can go ahead and merge afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dbt Cloud Provider only works for Multi-tenant instances

3 participants