-
Notifications
You must be signed in to change notification settings - Fork 16.3k
add readonly endpoints for dagruns #9153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # This endpoint allows specifying ~ as the dag_id to retrieve DAG Runs for all DAGs. | |
| if dag_id == '~': | |
| dag_run = query.all() | |
| return dagrun_collection_schema.dump(DAGRunCollection( | |
| dag_runs=dag_run, | |
| total_entries=len(dag_run)) | |
| ) | |
| query = query.filter(DagRun.dag_id == dag_id) | |
| if dag_id != '~': | |
| query = query.filter(DagRun.dag_id == dag_id) |
We still want to have filters and paginations. ~ is a wildcard. It allows any value, so we should give up filtering in this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if start_date_gte and start_date_lte: | |
| query = query.filter(DagRun.start_date <= start_date_lte, | |
| DagRun.start_date >= start_date_gte) | |
| elif start_date_gte and not start_date_lte: | |
| query = query.filter(DagRun.start_date >= start_date_gte) | |
| elif start_date_lte and not start_date_gte: | |
| query = query.filter(DagRun.start_date <= start_date_lte) | |
| if start_date_gte: | |
| query = query.filter(DagRun.start_date >= start_date_gte) | |
| if start_date_lte: | |
| query = query.filter(DagRun.start_date <= start_date_lte) |
Is this causing any problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have a string here? The specification define object here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.assertEqual(response.json.get('total_entries'), 2) | |
| self.assertEqual(response.json['total_entries'], 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use a fixed date in tests. This allows us to use text strings in tests. Now it is very difficult for me to check if the data format is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about webargs library? Would it be helpful to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that this can solve it for us without any validation code from our side:
app.add_api("openapi.yaml", strict_validation=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be using webargs. Connexion no longer validate date-format because of license of one of its library.
spec-first/connexion#476
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you did research. We can change it in a separate change. Now it works, so it's great. In next changes, we will be able to make a refactor and remove repetitive code fragments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should. be defined as a custom fields. This allows you to eliminate some hacks from the code.
class DagStateField(fields.String):
def __init__(self, **metadata):
super().__init__(**metadata)
self.validators = (
[validate.OneOf(State.dag_states)] + list(self.validators)
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| session.add_all(dagruns) | |
| session.add_all(dagruns) | |
| dagruns[0].dag_id.= "TEST_DAG_ID_2" |
Can you check here if two different DAGs can be fetched?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if this is ok a809933
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~ means that you can fetch any DAG ID. Your assertion should check if you have two DAGs e.g. DAG A and DAG B, you can fetch them. For now, you only have one DAG, which does not allow you to check the behavior of this view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check this c41c771
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think to assert the start/execution dates here? Identifiers poorly describe these objectshere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if this is ok 5545d36
mik-laj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are near the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ["TEST_START_EXEC_DAY_10"], | |
| ["TEST_START_EXEC_DAY_10", "TEST_START_EXEC_DAY_11"], |
We have an inclusive filter, so it should probably be included as well. However, this may change in the future.
Related issue:
#9237
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a problem with <= operator in sqlalchemy. It returns only the less than values.
I have also tried separating the operator and using the or_ operator but it's still not inclusive
I am wondering if there is another way I could do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a day off today, so I have limited options to check it out, but it can help you.
https://github.com/apache/airflow/blob/master/TESTING.rst#tracking-sql-statements
If something can be done at the SQL level, then SQL can do it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link, it helped.
What was happening was that on the view, dates that have this format '2020-06-11T18:00:00+00:00' are changed to '2020-06-11T18:00:00 00:00'. That is, the + are replaced with empty space. So when it is parsed, only the date part were parsed. I then replaced +00:00 with Z and the problem was solved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix 7a72935
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the command I used to investigate:
pytest --trace-sql=num,sql,parameters --capture=no tests/api_connexion/endpoints/test_dag_run_endpoint.py -k test_date_filters_gte_and_lte_1_api_v1_dags_TEST_DAG_ID_dagRuns_start_date_lte_2020_06_11T18_00_00_00_00
I set pdb debuger at this point in dag_run_endpoint:
if start_date_lte: import pdb; pdb.set_trace() query = query.filter(DagRun.start_date <= timezone.parse(start_date_lte))
Check image below:
|
I would like to review this change once more and look at the filter by dates, but today I see that there is a merge conflict. if you have moments could you do a rebase? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mik-laj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One test is sad. We need to fetch sorted items from the database.
Co-authored-by: Kamil Breguła <[email protected]>
Co-authored-by: Kamil Breguła <[email protected]>

Closes: #8129
Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.