Skip to content

ameya-parab:feature/sagemaker_proxies#6120

Merged
WeichenXu123 merged 11 commits intomlflow:masterfrom
ameya-parab:feature/sagemaker_proxies
Jul 6, 2022
Merged

ameya-parab:feature/sagemaker_proxies#6120
WeichenXu123 merged 11 commits intomlflow:masterfrom
ameya-parab:feature/sagemaker_proxies

Conversation

@ameya-parab
Copy link
Copy Markdown
Contributor

Related Issues/PRs

Close #6119

What changes are proposed in this pull request?

  1. Injecting proxies during docker build mlflow sagemaker build-and-push-container
  2. Injecting proxies as environment variables inside the deployed Sagemaker endpoint.

How is this patch tested?

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
    next step, otherwise fix it.
  2. Click Details on the right to open the job page of CircleCI.
  3. Click the Artifacts tab.
  4. Click docs/build/html/index.html.
  5. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Ameya Parab added 2 commits June 15, 2022 10:59
@github-actions github-actions bot added area/docker Docker use anywhere, such as MLprojects and MLmodels area/models MLmodel format, model serialization/deserialization, flavors area/scoring MLflow Model server, model deployment tools, Spark UDFs integrations/sagemaker Sagemaker integrations rn/none List under Small Changes in Changelogs. labels Jun 23, 2022
@ameya-parab ameya-parab marked this pull request as ready for review June 23, 2022 17:06
"-Dhttps.proxyPort={https_proxy_port} -Dhttps.nonProxyHosts=repo.maven.apache.org"
).format(
http_proxy_host=os.environ["http_proxy"].split("//")[1].split(":")[0],
http_proxy_port=os.environ["http_proxy"].rsplit(":", maxsplit=1)[1],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use http_proxy_port=os.environ["http_proxy"].split("//")[1].split(":")[1] ? any difference ?

Copy link
Copy Markdown
Member

@harupy harupy Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeichenXu123 maybe it's because http_proxy_port=os.environ["http_proxy"].rsplit(":", maxsplit=1)[1] is simpler, a bit faster, and eaiser to understand.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harupy and @WeichenXu123 thanks for your comments, my intention for using the split approach was to forcefully fail the code, if the proxy is not adhering to the standard format http://HOST:PORT, but I'll update the code (and add comments) to make use of `urllib.parse.urlparse' and update the PR.

@WeichenXu123
Copy link
Copy Markdown
Collaborator

Could you show your testing result ? e.g. screen records :)

Ameya Parab added 2 commits June 28, 2022 15:23
Signed-off-by: Ameya Parab <[email protected]>
Signed-off-by: Ameya Parab <[email protected]>
@ameya-parab
Copy link
Copy Markdown
Contributor Author

#6120 (comment)

The unit test results are attached below, after executing the /dev/run-python-sagemaker-tests.sh script.

Screen Shot 2022-06-28 at 1 24 39 PM

@dbczumar
Copy link
Copy Markdown
Collaborator

@WeichenXu123 Can you take another look at this?

Comment on lines +342 to +349
if os.getenv("http_proxy") is not None:
expected_model_environment.update({"http_proxy": os.environ["http_proxy"]})

if os.getenv("https_proxy") is not None:
expected_model_environment.update({"https_proxy": os.environ["https_proxy"]})

if os.getenv("no_proxy") is not None:
expected_model_environment.update({"no_proxy": os.environ["no_proxy"]})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to cover the case http(s)_proxy setting existing, we should patch the os.environ to make it contains "http(s)_proxy" env ?

Copy link
Copy Markdown
Contributor Author

@ameya-parab ameya-parab Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate on the ask here? How would you suggest handling the case wherein we mock proxy variables, should I duplicate each of the test case (9 in total) after mocking the proxies, or would you recommend a different approach?

Copy link
Copy Markdown
Collaborator

@WeichenXu123 WeichenXu123 Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use mock.patch.dict(os.environ, {...}) and add http(s)_proxy envs to patched os.environ

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, and would you suggest duplicating each of the 9 unit test cases, one without the proxies and one with mocked proxies?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicating each of the 9 unit test cases

Don't need to duplicate them, but I would suggest use @pytest.mark.parametrize on these tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, will work on that!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR, the unit tests now include parametrized tests to validate scenarios with and without proxies.

assert _http_proxy.hostname is not None, "Invalid `http_proxy` hostname."
assert isinstance(_http_proxy.port, int), f"Invalid Proxy Port: {_http_proxy.port}"

_https_proxy = urlparse(os.environ["https_proxy"])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: _https_proxy ==> parsed_https_proxy

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will rename the variables

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the variables as suggested above.

https_proxy_port=_https_proxy.port,
)
else:
MAVEN_PROXY = "" # No Proxy
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add test for MAVEN_PROXY ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added asserts for validating the proxy values, can you please elaborate as to what additional scenarios I need to test when it comes to MAVEN_PROXY. Thanks!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If hard to add test for MAVEN_PROXY, we can manually test it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can implement a manual test for MAVEN PROXY; however, as this is my first PR for mlflow, could you please advise on how to set up manual tests when contributing to mlflow?

Ameya Parab added 2 commits June 30, 2022 18:47
@ameya-parab ameya-parab requested a review from WeichenXu123 June 30, 2022 22:55
MAVEN_PROXY = (
" -DproxySet=true -Dhttp.proxyHost={http_proxy_host} "
"-Dhttp.proxyPort={http_proxy_port} -Dhttps.proxyHost={https_proxy_host} "
"-Dhttps.proxyPort={https_proxy_port} -Dhttps.nonProxyHosts=repo.maven.apache.org"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the http(s)_proxy contains username/password, shall set them in maven proxy config ?

Copy link
Copy Markdown
Contributor Author

@ameya-parab ameya-parab Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that well versed with maven, but I believe we can pass username and password as

MAVEN_PROXY += " -Dhttp.proxyUser={proxy_username} -Dhttp.proxyPassword={proxy_password}"

Could you please confirm, thanks!
My current organization's development setup does not require proxy authentication, making it difficult for me to test this scenario.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have env at hand, but I think this is correct.
-Dhttp.proxyUser={proxy_username} -Dhttp.proxyPassword={proxy_password}

Ref: https://www.baeldung.com/maven-behind-proxy

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the changes in the latest PR commit and have also modified the unit tests to validate authenticated proxy servers.

Signed-off-by: Ameya Parab <[email protected]>
@ameya-parab ameya-parab requested a review from WeichenXu123 July 1, 2022 18:59
proxy_variables = {
"http_proxy": "http://proxy.example.net:1234",
"https_proxy": "https://proxy.example.net:1234",
"no_proxy": "localhost",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's comment this is aws IP and change it as a global variable definition.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, so defining the AWS_IP as a variable AWS_IP = "169.254.169.254" over here https://github.com/mlflow/mlflow/blob/master/tests/helper_functions.py#L32 and then importing it over in tests/sagemaker/test_deployment.py, should be okay? Or did you have anything else in mind?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace 169.254.169.254 with a hostname ? And explain why we need to add no-proxy for it here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the IP 169.254.169.254 does not have a hostname but needs to be specified in no_proxy for interacting with AWS services.

Ref: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-proxy.html#cli-configure-proxy-ec2

Copy link
Copy Markdown
Collaborator

@WeichenXu123 WeichenXu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@WeichenXu123
Copy link
Copy Markdown
Collaborator

Some sagemake test failed. Could you fix it ?

@ameya-parab
Copy link
Copy Markdown
Contributor Author

#6120 (comment) Some sagemake test failed. Could you fix it ?

I modified the failing Sagemaker tests to remove only the 'proxy' variables while leaving the other environment variables alone. All of the Sagemaker tests passed in my local setup but failed in the Github runner, most likely due to the different environment setup.

@ameya-parab ameya-parab requested a review from WeichenXu123 July 5, 2022 17:43
@WeichenXu123 WeichenXu123 merged commit 698574e into mlflow:master Jul 6, 2022
sniafas pushed a commit to sniafas/mlflow that referenced this pull request Jul 19, 2022
* injecting proxies for maven calls

Signed-off-by: Ameya Parab <[email protected]>

* add proxies (if available) into SageMaker endpoint

Signed-off-by: Ameya Parab <[email protected]>

* towards sagemaker proxies fix

Signed-off-by: Ameya Parab <[email protected]>

* parsing proxies with urlparse

Signed-off-by: Ameya Parab <[email protected]>

* renamed 'proxy' variables

Signed-off-by: Ameya Parab <[email protected]>

* testing 'proxy' and 'no proxy' scenarios

Signed-off-by: Ameya Parab <[email protected]>

* excluding AWS IP in proxy

Signed-off-by: Ameya Parab <[email protected]>

* including proxy user and password if present

Signed-off-by: Ameya Parab <[email protected]>

* added AWS_METADATA_IP for use in no proxy

Signed-off-by: Ameya Parab <[email protected]>

* testing auth proxies for sagemaker deployments

Signed-off-by: Ameya Parab <[email protected]>

* auth proxies for deployments / sagemaker tests fix

Signed-off-by: Ameya Parab <[email protected]>

Co-authored-by: Ameya Parab <[email protected]>
@ameya-parab ameya-parab deleted the feature/sagemaker_proxies branch July 21, 2022 17:32
postrational pushed a commit to postrational/mlflow that referenced this pull request Jul 27, 2022
* injecting proxies for maven calls

Signed-off-by: Ameya Parab <[email protected]>

* add proxies (if available) into SageMaker endpoint

Signed-off-by: Ameya Parab <[email protected]>

* towards sagemaker proxies fix

Signed-off-by: Ameya Parab <[email protected]>

* parsing proxies with urlparse

Signed-off-by: Ameya Parab <[email protected]>

* renamed 'proxy' variables

Signed-off-by: Ameya Parab <[email protected]>

* testing 'proxy' and 'no proxy' scenarios

Signed-off-by: Ameya Parab <[email protected]>

* excluding AWS IP in proxy

Signed-off-by: Ameya Parab <[email protected]>

* including proxy user and password if present

Signed-off-by: Ameya Parab <[email protected]>

* added AWS_METADATA_IP for use in no proxy

Signed-off-by: Ameya Parab <[email protected]>

* testing auth proxies for sagemaker deployments

Signed-off-by: Ameya Parab <[email protected]>

* auth proxies for deployments / sagemaker tests fix

Signed-off-by: Ameya Parab <[email protected]>

Co-authored-by: Ameya Parab <[email protected]>
Signed-off-by: Michal Karzynski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docker Docker use anywhere, such as MLprojects and MLmodels area/models MLmodel format, model serialization/deserialization, flavors area/scoring MLflow Model server, model deployment tools, Spark UDFs integrations/sagemaker Sagemaker integrations rn/none List under Small Changes in Changelogs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FR] Allow Proxies in Sagemaker Endpoints and Docker builds

4 participants