Skip to content

Conversation

@pierrejeambrun
Copy link
Member

@pierrejeambrun pierrejeambrun commented Dec 17, 2021

closes: #20283

In this PR:

  • Upgrade the minimum package requirement to 2.33.0 for apache-beam (first stable for beam go sdk)
  • Refactor operators/beam.py with an abstract BeamBasePipelineOperator class to factorize initialization and common code, also fixed mypy hook on BeamDataflowMixin
  • Add BeamRunGoPipelineOperator and BeamHook.start_go_pipeline (+tests)
  • Add utils/go_module.py to handle initialisation and dependency installation for a module. (+ tests)
  • Slightly modified process_util + tests to be able to handle an extra optional parameter cwd. (This way we can move to the module directory to build it)
  • Write docs
  • Add dags examples, with system tests
  • I think we might want to install Go in the docker images. Go 1.16 minimum for beam 2.33
wget https://dl.google.com/go/go1.16.4.linux-amd64.tar.gz
sudo tar -xvf go1.16.4.linux-amd64.tar.gz   
sudo mv go /usr/local 
# in bashrc
export GOROOT=/usr/local/go
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH 

@pierrejeambrun
Copy link
Member Author

pierrejeambrun commented Dec 17, 2021

Just amended to fix static checks

@subkanthi
Copy link
Contributor

The dataflow tests seems to be failing, guessing its related to changing apache-beam version.

tests/providers/google/cloud/hooks/test_dataflow.py .................... [ 40%] .......................................................FFFFFFFFF........ [ 41%]

@pierrejeambrun pierrejeambrun force-pushed the 20283-operator-for-go-sdk-in-apache-beam branch from 2383bc8 to 79a1171 Compare December 20, 2021 12:41
@pierrejeambrun
Copy link
Member Author

pierrejeambrun commented Dec 20, 2021

@subkanthi Thank you for pointing that. There were 2 definitions of the same _jobs attribute, I removed one in my branch and main removed the other 😵‍💫, hence the missing attribute.

I just fixed that, and also rebased my branch.

@pierrejeambrun pierrejeambrun force-pushed the 20283-operator-for-go-sdk-in-apache-beam branch from 79a1171 to cf3376e Compare December 20, 2021 18:29
@mik-laj
Copy link
Member

mik-laj commented Dec 21, 2021

@aaltay @ibzib @kennknowles @TheNeuralBit Can I ask for a review?

Copy link
Member

@aaltay aaltay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Added a few comments.

Have you tried executing {java, python, go} * {direct, dataflow} jobs with these changes>

@aaltay
Copy link
Member

aaltay commented Dec 22, 2021

I will be OOO soon, and in my absence @jrmccluskey could continue the review. Thank you for adding this operator!

Copy link

@jrmccluskey jrmccluskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for adding support for Beam Go!

@pierrejeambrun pierrejeambrun force-pushed the 20283-operator-for-go-sdk-in-apache-beam branch 3 times, most recently from 35d3d5b to b94cd72 Compare January 9, 2022 21:05
@pierrejeambrun
Copy link
Member Author

pierrejeambrun commented Jan 9, 2022

I just rebased on main and updated the PR following your reviews. It is not yet ready to merge, I am having an issue running the go pipeline on dataflow, looks like he is not using the correct image even with the WorkerHarnessContainerImage: apache/beam_go_sdk:latest option

@aaltay
Copy link
Member

aaltay commented Jan 11, 2022

I just rebased on main and updated the PR following your reviews. It is not yet ready to merge, I am having an issue running the go pipeline on dataflow, looks like he is not using the correct image even with the WorkerHarnessContainerImage: apache/beam_go_sdk:latest option

If you cannot figure out, @jrmccluskey could help.

@pierrejeambrun pierrejeambrun force-pushed the 20283-operator-for-go-sdk-in-apache-beam branch from b94cd72 to a41481b Compare January 16, 2022 20:03
@pierrejeambrun pierrejeambrun force-pushed the 20283-operator-for-go-sdk-in-apache-beam branch from d8281b4 to afdd6b6 Compare January 22, 2022 13:45
@pierrejeambrun
Copy link
Member Author

@mik-laj The PR is ready if you want to take a look :)

@pierrejeambrun
Copy link
Member Author

Hi @eladkal and @potiuk,

It has been a couple of week since last activity on this PR. I would like to follow up, I think it's ready to merge.

Is there anything I can do to help you or changes that need to be made ?

@potiuk
Copy link
Member

potiuk commented Feb 10, 2022

@mik-laj - any more comments on that ? @pierrejeambrun - can you please rebase to latest main to check if there are no static checks/doc changes (the change is 179 commits behind main). There is the new feature from GitHub that makes it super - easy.

Screenshot 2022-02-10 at 22 16 04

@pierrejeambrun pierrejeambrun force-pushed the 20283-operator-for-go-sdk-in-apache-beam branch from e2d90c6 to 937259c Compare February 10, 2022 22:56
@pierrejeambrun
Copy link
Member Author

@potiuk Yes of course, I just rebased the branch. Nice I didn't know we could do that directly from the GitHub interface 👍

@potiuk
Copy link
Member

potiuk commented Feb 11, 2022

@potiuk Yes of course, I just rebased the branch. Nice I didn't know we could do that directly from the GitHub interface +1

It's a new feature (added last week) which we turned on immediately after it appeared (that's why I am also spreading the news about it in comments :) ).

Unfortunately - static checks you will have to fix by pushing new code I am afraid 😃

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Feb 11, 2022
@pierrejeambrun
Copy link
Member Author

pierrejeambrun commented Feb 12, 2022

Unfortunately - static checks you will have to fix by pushing new code I am afraid 😄

@potiuk that new docstring-params hook got me I guess 😂

I just pushed a fix for that

@pierrejeambrun pierrejeambrun force-pushed the 20283-operator-for-go-sdk-in-apache-beam branch 3 times, most recently from fcc08ea to 77a180a Compare February 12, 2022 22:41
Fix dataflow test

Fix following aaltay review

Update following jrmccluskey code review

Update PR following mik-laj review

Fix CI

Fix Java pipeline local runner
@pierrejeambrun pierrejeambrun force-pushed the 20283-operator-for-go-sdk-in-apache-beam branch from 77a180a to cb80ed3 Compare February 12, 2022 22:42
@pierrejeambrun
Copy link
Member Author

Hello @potiuk,

I hope you are doing well 😄

The CI has failed on Tests / Helm Chart; KubernetesExecutor (1 test didn't pass) but it does not seem related. I'm not familiar with it, should we relaunch the pipeline ?

@potiuk
Copy link
Member

potiuk commented Feb 13, 2022

IT's a flaky test. No worries

@potiuk potiuk merged commit da485da into apache:main Feb 13, 2022
@potiuk
Copy link
Member

potiuk commented Feb 13, 2022

I am doing some more changes in the images, so I might add Go to the image. Can you please add an issue about it @pierrejeambrun and CC: me ?

@pierrejeambrun pierrejeambrun mentioned this pull request Feb 13, 2022
2 tasks
@pierrejeambrun pierrejeambrun deleted the 20283-operator-for-go-sdk-in-apache-beam branch February 13, 2022 16:29
@jedcunningham jedcunningham added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge kind:documentation provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New operator for Go SDK in Apache Beam

7 participants