Skip to content

Conversation

@dinigo
Copy link
Contributor

@dinigo dinigo commented Aug 12, 2021

closes: #13750
related: #17315

SqlSensor relies on underlying hooks that are automatically injected (DI) depending on the connection type provided to the SqlSensor. However, from time to time (but mainly in BigqueryHook) it is needed to pre-configure the "to-be-injected" hook because some default params do not fit the current config.

For example, if using SqlSensor with Bigquery it by default configures the SQL dialect to "legacy SQL", and there is no way to change this through configuration. Instead the SqlSensor has to be extended and some functions overwritten (as mentioned in the #13750).

To be able to customise the underlying hook there are 2 approaches to fix this:

  • Bring your own hook: instantiate a hook and provide it to the sensor already configured so it just checks that it inherits from DbApiHook but doesn't try to instantiate on its own.
  • Pass over the params you want SqlSensor to use to instantiate the hook.

Either one has it's drawbacks:

  • Cannot be used if DAG is defined with yaml/json/config-file, since to instantiate a Hook requires "intelligence", relying solely in config seems more maintainable in the future.
  • The "tell us what's wrong, we got you covered" does not follow very well the "open but closed" principle.

Knowing that, and that @uranusjr approves this second approach, and also given the known limitations, this PR makes the SqlSensor pass over the hook_kwargs to the connection hook locator that returns an instantiated hook.

@dinigo
Copy link
Contributor Author

dinigo commented Aug 12, 2021

mypy pre-commit hook is having some issue when trying to pull airflow image. I don't know why because breeze works fine. I just disabled it for this commit to be able to pre-provide something readable.

docker: read tcp 192.168.65.3:59194->192.168.65.1:3128: read: connection reset by peer.
See 'docker run --help'.

Any help fixing the pre-commit will be much appreciated.

@dinigo
Copy link
Contributor Author

dinigo commented Aug 13, 2021

Looks like some tests are broken. Could anyone give a little context on the tests?

@potiuk
Copy link
Member

potiuk commented Aug 13, 2021

Please rebase to latest main

@dinigo dinigo force-pushed the pass-arguments-to-dbapihook branch from 526b4ea to f87a45c Compare August 17, 2021 16:04
@dinigo
Copy link
Contributor Author

dinigo commented Aug 17, 2021

Done rebasing


def get_hook(self):
"""Return hook based on conn_type."""
def get_hook(self, hook_kwargs: Dict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be **hook_kwargs

+ f"Supported connection types: {list(allowed_conn_type)}"
)
return conn.get_hook()
return conn.get_hook(**self.hook_kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… or this should be conn.get_hook(self.hook_kwargs)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed my mind about who should receive the destructured dict. I suppose you are right. because then we don't risk passing an argument to get_hook that is later popped out of the args, or unintendedly defined and taken out of the get_hook kwargs. I'll switch it the other way around. Just like discussed in the issue

Comment on lines +245 to +259
@mock.patch('airflow.sensors.sql.BaseHook')
def test_sql_sensor_bigquery_hook_kwargs(self, mock_hook):
op = SqlSensor(
task_id='sql_sensor_check',
conn_id='postgres_default',
sql="SELECT 1",
hook_kwargs={
'use_legacy_sql': False,
'location': 'test_location',
},
)

mock_hook.get_connection('google_cloud_default').conn_type = "google_cloud_platform"
assert op._get_hook().use_legacy_sql
assert op._get_hook().location == 'test_location'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this test should ensure the kwargs are correctly passed onto the hook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean mocking also the BaseHook and checking that call arguments match the ** hook_kwargs? you are right

@kazanzhy
Copy link
Contributor

Is this PR in progress?
I've also thought to make the same one to use BigQuery StandartSQL in SqlSensor.

success=None,
failure=None,
fail_on_empty=False,
hook_kwargs=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hook_kwargs=None,
hook_kwargs={},

And then the line below could be reduced to

	self.hook_kwargs = hook_kwargs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you take a look #18394 please?

@github-actions
Copy link

github-actions bot commented Nov 6, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 6, 2021
@kazanzhy
Copy link
Contributor

kazanzhy commented Nov 6, 2021

Hi @SamWheating.
Could you give me advice on what I have to do to move this PR?

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 7, 2021
@eladkal
Copy link
Contributor

eladkal commented Nov 26, 2021

Suppressed by #18431

@eladkal eladkal closed this Nov 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Standard SQL in BigQuery Sensor

6 participants