Add endpoints for task instances #9597

mik-laj · 2020-06-30T20:59:32Z

Make sure to mark the boxes below before creating PR: [x]

Description above provides context of the change
Unit tests coverage for changes (not needed for documentation changes)
Target Github ISSUE in description if exists
Commits follow "How to write a good git commit message"
Relevant documentation is updated including usage instructions.
I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

jhtimmins

Hey @mik-laj, apologies that I only noticed the WIP flag after I added some comments. I'll leave them in case they're helpful.

jhtimmins · 2020-07-01T05:41:16Z

airflow/api_connexion/endpoints/task_instance_endpoint.py

I'm not sure this should live inside API endpoint code. Fetching a task instance is pretty generic functionality. If we don't already support code that does this, we should probably add it to TaskInstance.py or similar files.

Inside Airflow, we use execution_date as the primary identifier. However, we voted to use dag_run_id in the API
https://lists.apache.org/thread.html/rd4be3829627dcef8b40314c62c041f460992786f3bfcc634d25a6664%40%3Cdev.airflow.apache.org%3E

Ah sorry for being unclear. I was suggesting get_task_instance be moved to a different file outside the API.

)Can be a follow up PR (and likely candidate would be to TaskIstance.get_by_run_id())

airflow/api_connexion/endpoints/task_instance_endpoint.py

jhtimmins · 2020-07-01T05:44:08Z

airflow/api_connexion/endpoints/task_instance_endpoint.py

This filter seems somewhat ambiguous. I think the name could better express the purpose.

I do not know if I understand correctly. Can you say more?

Mmmm actually nevermind. I think it makes sense.

jhtimmins · 2020-07-01T05:44:57Z

airflow/api_connexion/endpoints/task_instance_endpoint.py

This function accepts 16 parameters. If we really need to pass in this many configuration details, is there a data structure that would better encapsulate system state than so many standalone params?

We use connexion, which fills these parameters based on the API specification.
https://connexion.readthedocs.io/en/latest/request.html#automatic-parameter-handling

jhtimmins · 2020-07-01T05:47:21Z

airflow/api_connexion/endpoints/task_instance_endpoint.py

Suggested change

Get list of a task instances

Get list of task instances

airflow/api_connexion/endpoints/task_instance_endpoint.py

jhtimmins · 2020-07-01T05:50:16Z

airflow/api_connexion/endpoints/task_instance_endpoint.py

Looks like the preceding two lines are duplicates

Good point. I will remove it

mik-laj · 2020-08-25T01:35:13Z

airflow/api_connexion/schemas/task_instance_schema.py

Here we have an n+1 problem. I think we can fetch it much more efficiently if we use the more advanced features of SQLAlchemy.
https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html

Hey @mik-laj we are not using relationships in the model so I was planning to use explicit joins to overcome this n + 1 problem the only downside of that it is returns a tuple of object (<task instance>,<sla miss>) so I was planning to overwrite get_attribute. Let me know if this sounds like a good idea. I will be doing something similar here

class TaskInstanceReferenceSchema(Schema): """Schema for the task instance reference schema""" task_id = fields.Str() dag_run_id = fields.Method('get_run_id') dag_id = fields.Str() execution_date = fields.DateTime() @staticmethod def get_run_id(obj: TaskInstance): with create_session() as session: return session.query(DagRun).filter(DagRun.dag_id==obj.dag_id, DagRun.execution_date==obj.execution_date).one_or_none().run_id

if you need it, we can start using relationships.

I'll try to look at it tomorrow.

Cool, in the meantime I will prepare this solution

OmairK · 2020-08-29T00:45:52Z

airflow/api_connexion/endpoints/task_instance_endpoint.py

@mik-laj I have dealt with the n +1 problem, but I am unsure if this query is the best way to go forward.

@mik-laj is there a better approach that I should work on or is this fine?

airflow/api_connexion/endpoints/task_instance_endpoint.py

kaxil · 2020-10-05T16:38:47Z

@OmairK @mik-laj Any updates here?

OmairK · 2020-10-05T16:44:52Z

@OmairK @mik-laj Any updates here?

Waiting for @mik-laj's review on the approach I followed.

mik-laj · 2020-10-06T09:08:07Z

I have a problem with this solution. I added one test that tests this endpoint with task instance and SLAMiss on this branch. Can you take a look at it?

OmairK · 2020-10-08T01:15:15Z

airflow/api_connexion/endpoints/task_instance_endpoint.py

@mik-laj Passing multiple columns to distinct works perfectly with postgres but raises error in case of sqlite how should we handle this case.

It seems to me that the count query doesn't need join, so we can simplify everything and improve SQL compatibility.

mik-laj · 2020-10-09T08:39:41Z

Everything works. I am merging all commits into one commit on my branch and doing rebase now

github-actions · 2020-10-09T13:06:50Z

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

github-actions · 2020-10-09T14:58:21Z

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

github-actions · 2020-10-09T16:01:08Z

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.

airflow/api_connexion/endpoints/task_instance_endpoint.py

kaxil · 2020-10-12T19:56:27Z

airflow/api_connexion/endpoints/task_instance_endpoint.py

+        .join(DR, and_(TI.dag_id == DR.dag_id, TI.execution_date == DR.execution_date))
+        .filter(DR.run_id == dag_run_id)
+        .filter(TI.task_id == task_id)
+        .join(


Suggested change

.join(

.outerjoin(

if we want to do an outerjoin instead of isouter=True

kaxil · 2020-10-12T19:57:28Z

airflow/api_connexion/endpoints/task_instance_endpoint.py

-        ('can_read', 'DagRun'),
+        ("can_read", "Dag"),
+        ("can_read", "DagRun"),
        ('can_read', 'Task'),


Can we change single to double quotes here too for consistency

airflow/api_connexion/endpoints/task_instance_endpoint.py

kaxil

left small comments

mik-laj · 2020-10-13T08:17:07Z

@turbaszek @kaxil @jhtimmins Can I ask for a review?

airflow/api_connexion/openapi/v1.yaml

kaxil

1 minor suggestion left

Co-authored-by: Kaxil Naik <[email protected]>

boring-cyborg bot added the area:API Airflow's REST/HTTP API label Jun 30, 2020

jhtimmins reviewed Jul 1, 2020

View reviewed changes

mik-laj linked an issue Jul 1, 2020 that may be closed by this pull request

API Endpoints - Read-only - Task Instance #8132

Closed

OmairK force-pushed the api-taskinstance branch 2 times, most recently from 927896f to 201b31c Compare August 22, 2020 21:04

mik-laj commented Aug 25, 2020

View reviewed changes

OmairK force-pushed the api-taskinstance branch from 9b76ceb to a51527e Compare August 28, 2020 18:06

OmairK reviewed Aug 29, 2020

View reviewed changes

kaxil reviewed Sep 16, 2020

View reviewed changes

airflow/api_connexion/endpoints/task_instance_endpoint.py Outdated Show resolved Hide resolved

OmairK mentioned this pull request Sep 28, 2020

API Endpoints - Read-only - Task Instance #8132

Closed

OmairK reviewed Oct 8, 2020

View reviewed changes

mik-laj force-pushed the api-taskinstance branch from d9d2fdd to f56ae3a Compare October 9, 2020 09:15

mik-laj changed the title ~~[WIP] Add read-only endpoints for task instances~~ Add endpoints for task instances Oct 9, 2020

mik-laj force-pushed the api-taskinstance branch from f56ae3a to 3bc9b7e Compare October 9, 2020 12:44

mik-laj force-pushed the api-taskinstance branch from 3bc9b7e to 8a6c35e Compare October 9, 2020 14:27

potiuk modified the milestones: Airflow 2.0.0-alpha1, Airflow 2.0.0-alpha2 Oct 12, 2020

Add read-only endpoints for task instances

6399bfc

mik-laj force-pushed the api-taskinstance branch from 586a0b1 to 6399bfc Compare October 12, 2020 19:32

kaxil reviewed Oct 12, 2020

View reviewed changes

airflow/api_connexion/endpoints/task_instance_endpoint.py Show resolved Hide resolved

kaxil reviewed Oct 12, 2020

View reviewed changes

airflow/api_connexion/endpoints/task_instance_endpoint.py Show resolved Hide resolved

kaxil reviewed Oct 12, 2020

View reviewed changes

Kamil Breguła added 2 commits October 13, 2020 00:29

fixup! Add read-only endpoints for task instances

46b9443

fixup! fixup! Add read-only endpoints for task instances

4560bc1

mik-laj marked this pull request as ready for review October 13, 2020 08:15

kaxil reviewed Oct 13, 2020

View reviewed changes

airflow/api_connexion/openapi/v1.yaml Outdated Show resolved Hide resolved

kaxil approved these changes Oct 13, 2020

View reviewed changes

Update airflow/api_connexion/openapi/v1.yaml

e1cc72f

Co-authored-by: Kaxil Naik <[email protected]>

kaxil merged commit 5772d4d into apache:master Oct 13, 2020

ashb modified the milestones: Airflow 2.0.0-alpha2, Airflow 2.0.0-alpha1 Oct 13, 2020

Add endpoints for task instances #9597

Add endpoints for task instances #9597

Uh oh!

Conversation

mik-laj commented Jun 30, 2020

Uh oh!

jhtimmins left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mik-laj Jul 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kaxil commented Oct 5, 2020

Uh oh!

OmairK commented Oct 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mik-laj commented Oct 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mik-laj commented Oct 9, 2020

Uh oh!

github-actions bot commented Oct 9, 2020

Uh oh!

github-actions bot commented Oct 9, 2020

Uh oh!

github-actions bot commented Oct 9, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mik-laj Jul 1, 2020 •

edited

Loading

OmairK commented Oct 5, 2020 •

edited

Loading

mik-laj commented Oct 6, 2020 •

edited

Loading