Skip to content

Conversation

@mik-laj
Copy link
Member

@mik-laj mik-laj commented Jun 30, 2020


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the area:API Airflow's REST/HTTP API label Jun 30, 2020
Copy link
Contributor

@jhtimmins jhtimmins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mik-laj, apologies that I only noticed the WIP flag after I added some comments. I'll leave them in case they're helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should live inside API endpoint code. Fetching a task instance is pretty generic functionality. If we don't already support code that does this, we should probably add it to TaskInstance.py or similar files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry for being unclear. I was suggesting get_task_instance be moved to a different file outside the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)Can be a follow up PR (and likely candidate would be to TaskIstance.get_by_run_id())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter seems somewhat ambiguous. I think the name could better express the purpose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know if I understand correctly. Can you say more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm actually nevermind. I think it makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function accepts 16 parameters. If we really need to pass in this many configuration details, is there a data structure that would better encapsulate system state than so many standalone params?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Get list of a task instances
Get list of task instances

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the preceding two lines are duplicates

Copy link
Member Author

@mik-laj mik-laj Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will remove it

@mik-laj mik-laj linked an issue Jul 1, 2020 that may be closed by this pull request
@OmairK OmairK force-pushed the api-taskinstance branch 2 times, most recently from 927896f to 201b31c Compare August 22, 2020 21:04
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we have an n+1 problem. I think we can fetch it much more efficiently if we use the more advanced features of SQLAlchemy.
https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mik-laj we are not using relationships in the model so I was planning to use explicit joins to overcome this n + 1 problem the only downside of that it is returns a tuple of object (<task instance>,<sla miss>) so I was planning to overwrite get_attribute. Let me know if this sounds like a good idea. I will be doing something similar here

class TaskInstanceReferenceSchema(Schema):
    """Schema for the task instance reference schema"""
    task_id = fields.Str()
    dag_run_id = fields.Method('get_run_id')
    dag_id = fields.Str()
    execution_date = fields.DateTime()

    @staticmethod
    def get_run_id(obj: TaskInstance):
        with create_session() as session:
            return session.query(DagRun).filter(DagRun.dag_id==obj.dag_id, DagRun.execution_date==obj.execution_date).one_or_none().run_id

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you need it, we can start using relationships.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to look at it tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, in the meantime I will prepare this solution

Comment on lines 187 to 212
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mik-laj I have dealt with the n +1 problem, but I am unsure if this query is the best way to go forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mik-laj is there a better approach that I should work on or is this fine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kaxil
Copy link
Member

kaxil commented Oct 5, 2020

@OmairK @mik-laj Any updates here?

@OmairK
Copy link
Contributor

OmairK commented Oct 5, 2020

@OmairK @mik-laj Any updates here?

Waiting for @mik-laj's review on the approach I followed.

@mik-laj
Copy link
Member Author

mik-laj commented Oct 6, 2020

I have a problem with this solution. I added one test that tests this endpoint with task instance and SLAMiss on this branch. Can you take a look at it?

Comment on lines 150 to 164
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mik-laj Passing multiple columns to distinct works perfectly with postgres but raises error in case of sqlite how should we handle this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the count query doesn't need join, so we can simplify everything and improve SQL compatibility.

@mik-laj
Copy link
Member Author

mik-laj commented Oct 9, 2020

Everything works. I am merging all commits into one commit on my branch and doing rebase now

@mik-laj mik-laj changed the title [WIP] Add read-only endpoints for task instances Add endpoints for task instances Oct 9, 2020
@github-actions
Copy link

github-actions bot commented Oct 9, 2020

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented Oct 9, 2020

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

github-actions bot commented Oct 9, 2020

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

.join(DR, and_(TI.dag_id == DR.dag_id, TI.execution_date == DR.execution_date))
.filter(DR.run_id == dag_run_id)
.filter(TI.task_id == task_id)
.join(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.join(
.outerjoin(

if we want to do an outerjoin instead of isouter=True

('can_read', 'DagRun'),
("can_read", "Dag"),
("can_read", "DagRun"),
('can_read', 'Task'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change single to double quotes here too for consistency

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. :-)

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left small comments

@mik-laj mik-laj marked this pull request as ready for review October 13, 2020 08:15
@mik-laj
Copy link
Member Author

mik-laj commented Oct 13, 2020

@turbaszek @kaxil @jhtimmins Can I ask for a review?

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 minor suggestion left

@kaxil kaxil merged commit 5772d4d into apache:master Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API Endpoints - Read-only - Task Instance

6 participants