Skip to content

Conversation

@ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Jun 5, 2020


Closes: #8129
Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:webserver Webserver related Issues labels Jun 5, 2020
@ephraimbuddy ephraimbuddy marked this pull request as draft June 5, 2020 09:54
Comment on lines 82 to 90
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This endpoint allows specifying ~ as the dag_id to retrieve DAG Runs for all DAGs.
if dag_id == '~':
dag_run = query.all()
return dagrun_collection_schema.dump(DAGRunCollection(
dag_runs=dag_run,
total_entries=len(dag_run))
)
query = query.filter(DagRun.dag_id == dag_id)
if dag_id != '~':
query = query.filter(DagRun.dag_id == dag_id)

We still want to have filters and paginations. ~ is a wildcard. It allows any value, so we should give up filtering in this field.

Comment on lines 93 to 101
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if start_date_gte and start_date_lte:
query = query.filter(DagRun.start_date <= start_date_lte,
DagRun.start_date >= start_date_gte)
elif start_date_gte and not start_date_lte:
query = query.filter(DagRun.start_date >= start_date_gte)
elif start_date_lte and not start_date_gte:
query = query.filter(DagRun.start_date <= start_date_lte)
if start_date_gte:
query = query.filter(DagRun.start_date >= start_date_gte)
if start_date_lte:
query = query.filter(DagRun.start_date <= start_date_lte)

Is this causing any problems?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a string here? The specification define object here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertEqual(response.json.get('total_entries'), 2)
self.assertEqual(response.json['total_entries'], 2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use a fixed date in tests. This allows us to use text strings in tests. Now it is very difficult for me to check if the data format is correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about webargs library? Would it be helpful to you?

Copy link
Contributor Author

@ephraimbuddy ephraimbuddy Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that this can solve it for us without any validation code from our side:

app.add_api("openapi.yaml", strict_validation=True)

https://connexion.readthedocs.io/en/latest/request.html#parameter-validation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be using webargs. Connexion no longer validate date-format because of license of one of its library.
spec-first/connexion#476

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you did research. We can change it in a separate change. Now it works, so it's great. In next changes, we will be able to make a refactor and remove repetitive code fragments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should. be defined as a custom fields. This allows you to eliminate some hacks from the code.

class DagStateField(fields.String):
    
    def __init__(self, **metadata):
        super().__init__(**metadata)
        self.validators = (
            [validate.OneOf(State.dag_states)] + list(self.validators)
        )

Copy link
Member

@mik-laj mik-laj Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
session.add_all(dagruns)
session.add_all(dagruns)
dagruns[0].dag_id.= "TEST_DAG_ID_2"

Can you check here if two different DAGs can be fetched?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if this is ok a809933

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~ means that you can fetch any DAG ID. Your assertion should check if you have two DAGs e.g. DAG A and DAG B, you can fetch them. For now, you only have one DAG, which does not allow you to check the behavior of this view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this c41c771

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😻

Copy link
Member

@mik-laj mik-laj Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think to assert the start/execution dates here? Identifiers poorly describe these objectshere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if this is ok 5545d36

Copy link
Member

@mik-laj mik-laj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are near the end.

@ephraimbuddy ephraimbuddy changed the title [WIP] add readonly endpoints for dagruns add readonly endpoints for dagruns Jun 11, 2020
@ephraimbuddy ephraimbuddy marked this pull request as ready for review June 11, 2020 15:47
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
["TEST_START_EXEC_DAY_10"],
["TEST_START_EXEC_DAY_10", "TEST_START_EXEC_DAY_11"],

We have an inclusive filter, so it should probably be included as well. However, this may change in the future.

Related issue:
#9237

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a problem with <= operator in sqlalchemy. It returns only the less than values.
I have also tried separating the operator and using the or_ operator but it's still not inclusive
I am wondering if there is another way I could do it?

Copy link
Member

@mik-laj mik-laj Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a day off today, so I have limited options to check it out, but it can help you.
https://github.com/apache/airflow/blob/master/TESTING.rst#tracking-sql-statements
If something can be done at the SQL level, then SQL can do it too.

Copy link
Contributor Author

@ephraimbuddy ephraimbuddy Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link, it helped.
What was happening was that on the view, dates that have this format '2020-06-11T18:00:00+00:00' are changed to '2020-06-11T18:00:00 00:00'. That is, the + are replaced with empty space. So when it is parsed, only the date part were parsed. I then replaced +00:00 with Z and the problem was solved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix 7a72935

Copy link
Contributor Author

@ephraimbuddy ephraimbuddy Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the command I used to investigate:
pytest --trace-sql=num,sql,parameters --capture=no tests/api_connexion/endpoints/test_dag_run_endpoint.py -k test_date_filters_gte_and_lte_1_api_v1_dags_TEST_DAG_ID_dagRuns_start_date_lte_2020_06_11T18_00_00_00_00

I set pdb debuger at this point in dag_run_endpoint:

if start_date_lte: import pdb; pdb.set_trace() query = query.filter(DagRun.start_date <= timezone.parse(start_date_lte))

Check image below:

start_date_lte

@mik-laj
Copy link
Member

mik-laj commented Jun 14, 2020

I would like to review this change once more and look at the filter by dates, but today I see that there is a merge conflict. if you have moments could you do a rebase?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Member

@mik-laj mik-laj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test is sad. We need to fetch sorted items from the database.

@mik-laj mik-laj merged commit b6f4837 into apache:master Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:webserver Webserver related Issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API Endpoints - Read-only - DAG Runs

3 participants