ameya-parab:feature/sagemaker_proxies#6120
Conversation
Signed-off-by: Ameya Parab <[email protected]>
Signed-off-by: Ameya Parab <[email protected]>
mlflow/models/docker_utils.py
Outdated
| "-Dhttps.proxyPort={https_proxy_port} -Dhttps.nonProxyHosts=repo.maven.apache.org" | ||
| ).format( | ||
| http_proxy_host=os.environ["http_proxy"].split("//")[1].split(":")[0], | ||
| http_proxy_port=os.environ["http_proxy"].rsplit(":", maxsplit=1)[1], |
There was a problem hiding this comment.
why not use http_proxy_port=os.environ["http_proxy"].split("//")[1].split(":")[1] ? any difference ?
There was a problem hiding this comment.
@WeichenXu123 maybe it's because http_proxy_port=os.environ["http_proxy"].rsplit(":", maxsplit=1)[1] is simpler, a bit faster, and eaiser to understand.
There was a problem hiding this comment.
@harupy and @WeichenXu123 thanks for your comments, my intention for using the split approach was to forcefully fail the code, if the proxy is not adhering to the standard format http://HOST:PORT, but I'll update the code (and add comments) to make use of `urllib.parse.urlparse' and update the PR.
|
Could you show your testing result ? e.g. screen records :) |
Signed-off-by: Ameya Parab <[email protected]>
Signed-off-by: Ameya Parab <[email protected]>
|
The unit test results are attached below, after executing the |
|
@WeichenXu123 Can you take another look at this? |
| if os.getenv("http_proxy") is not None: | ||
| expected_model_environment.update({"http_proxy": os.environ["http_proxy"]}) | ||
|
|
||
| if os.getenv("https_proxy") is not None: | ||
| expected_model_environment.update({"https_proxy": os.environ["https_proxy"]}) | ||
|
|
||
| if os.getenv("no_proxy") is not None: | ||
| expected_model_environment.update({"no_proxy": os.environ["no_proxy"]}) |
There was a problem hiding this comment.
In order to cover the case http(s)_proxy setting existing, we should patch the os.environ to make it contains "http(s)_proxy" env ?
There was a problem hiding this comment.
Could you please elaborate on the ask here? How would you suggest handling the case wherein we mock proxy variables, should I duplicate each of the test case (9 in total) after mocking the proxies, or would you recommend a different approach?
There was a problem hiding this comment.
I think you can use mock.patch.dict(os.environ, {...}) and add http(s)_proxy envs to patched os.environ
There was a problem hiding this comment.
Sounds good, and would you suggest duplicating each of the 9 unit test cases, one without the proxies and one with mocked proxies?
There was a problem hiding this comment.
duplicating each of the 9 unit test cases
Don't need to duplicate them, but I would suggest use @pytest.mark.parametrize on these tests.
There was a problem hiding this comment.
Perfect, will work on that!
There was a problem hiding this comment.
Updated PR, the unit tests now include parametrized tests to validate scenarios with and without proxies.
mlflow/models/docker_utils.py
Outdated
| assert _http_proxy.hostname is not None, "Invalid `http_proxy` hostname." | ||
| assert isinstance(_http_proxy.port, int), f"Invalid Proxy Port: {_http_proxy.port}" | ||
|
|
||
| _https_proxy = urlparse(os.environ["https_proxy"]) |
There was a problem hiding this comment.
nit: _https_proxy ==> parsed_https_proxy
There was a problem hiding this comment.
Makes sense, will rename the variables
There was a problem hiding this comment.
Renamed the variables as suggested above.
| https_proxy_port=_https_proxy.port, | ||
| ) | ||
| else: | ||
| MAVEN_PROXY = "" # No Proxy |
There was a problem hiding this comment.
Shall we add test for MAVEN_PROXY ?
There was a problem hiding this comment.
I have added asserts for validating the proxy values, can you please elaborate as to what additional scenarios I need to test when it comes to MAVEN_PROXY. Thanks!
There was a problem hiding this comment.
If hard to add test for MAVEN_PROXY, we can manually test it.
There was a problem hiding this comment.
Sure, I can implement a manual test for MAVEN PROXY; however, as this is my first PR for mlflow, could you please advise on how to set up manual tests when contributing to mlflow?
Signed-off-by: Ameya Parab <[email protected]>
Signed-off-by: Ameya Parab <[email protected]>
| MAVEN_PROXY = ( | ||
| " -DproxySet=true -Dhttp.proxyHost={http_proxy_host} " | ||
| "-Dhttp.proxyPort={http_proxy_port} -Dhttps.proxyHost={https_proxy_host} " | ||
| "-Dhttps.proxyPort={https_proxy_port} -Dhttps.nonProxyHosts=repo.maven.apache.org" |
There was a problem hiding this comment.
If the http(s)_proxy contains username/password, shall set them in maven proxy config ?
There was a problem hiding this comment.
I'm not that well versed with maven, but I believe we can pass username and password as
MAVEN_PROXY += " -Dhttp.proxyUser={proxy_username} -Dhttp.proxyPassword={proxy_password}"
Could you please confirm, thanks!
My current organization's development setup does not require proxy authentication, making it difficult for me to test this scenario.
There was a problem hiding this comment.
I don't have env at hand, but I think this is correct.
-Dhttp.proxyUser={proxy_username} -Dhttp.proxyPassword={proxy_password}
There was a problem hiding this comment.
Implemented the changes in the latest PR commit and have also modified the unit tests to validate authenticated proxy servers.
Signed-off-by: Ameya Parab <[email protected]>
tests/sagemaker/test_deployment.py
Outdated
| proxy_variables = { | ||
| "http_proxy": "http://proxy.example.net:1234", | ||
| "https_proxy": "https://proxy.example.net:1234", | ||
| "no_proxy": "localhost", |
There was a problem hiding this comment.
Let's comment this is aws IP and change it as a global variable definition.
There was a problem hiding this comment.
Sure, so defining the AWS_IP as a variable AWS_IP = "169.254.169.254" over here https://github.com/mlflow/mlflow/blob/master/tests/helper_functions.py#L32 and then importing it over in tests/sagemaker/test_deployment.py, should be okay? Or did you have anything else in mind?
There was a problem hiding this comment.
Can we replace 169.254.169.254 with a hostname ? And explain why we need to add no-proxy for it here ?
There was a problem hiding this comment.
It appears the IP 169.254.169.254 does not have a hostname but needs to be specified in no_proxy for interacting with AWS services.
|
Some sagemake test failed. Could you fix it ? |
Signed-off-by: Ameya Parab <[email protected]>
Signed-off-by: Ameya Parab <[email protected]>
Signed-off-by: Ameya Parab <[email protected]>
Signed-off-by: Ameya Parab <[email protected]>
I modified the failing Sagemaker tests to remove only the 'proxy' variables while leaving the other environment variables alone. All of the Sagemaker tests passed in my local setup but failed in the Github runner, most likely due to the different environment setup. |
* injecting proxies for maven calls Signed-off-by: Ameya Parab <[email protected]> * add proxies (if available) into SageMaker endpoint Signed-off-by: Ameya Parab <[email protected]> * towards sagemaker proxies fix Signed-off-by: Ameya Parab <[email protected]> * parsing proxies with urlparse Signed-off-by: Ameya Parab <[email protected]> * renamed 'proxy' variables Signed-off-by: Ameya Parab <[email protected]> * testing 'proxy' and 'no proxy' scenarios Signed-off-by: Ameya Parab <[email protected]> * excluding AWS IP in proxy Signed-off-by: Ameya Parab <[email protected]> * including proxy user and password if present Signed-off-by: Ameya Parab <[email protected]> * added AWS_METADATA_IP for use in no proxy Signed-off-by: Ameya Parab <[email protected]> * testing auth proxies for sagemaker deployments Signed-off-by: Ameya Parab <[email protected]> * auth proxies for deployments / sagemaker tests fix Signed-off-by: Ameya Parab <[email protected]> Co-authored-by: Ameya Parab <[email protected]>
* injecting proxies for maven calls Signed-off-by: Ameya Parab <[email protected]> * add proxies (if available) into SageMaker endpoint Signed-off-by: Ameya Parab <[email protected]> * towards sagemaker proxies fix Signed-off-by: Ameya Parab <[email protected]> * parsing proxies with urlparse Signed-off-by: Ameya Parab <[email protected]> * renamed 'proxy' variables Signed-off-by: Ameya Parab <[email protected]> * testing 'proxy' and 'no proxy' scenarios Signed-off-by: Ameya Parab <[email protected]> * excluding AWS IP in proxy Signed-off-by: Ameya Parab <[email protected]> * including proxy user and password if present Signed-off-by: Ameya Parab <[email protected]> * added AWS_METADATA_IP for use in no proxy Signed-off-by: Ameya Parab <[email protected]> * testing auth proxies for sagemaker deployments Signed-off-by: Ameya Parab <[email protected]> * auth proxies for deployments / sagemaker tests fix Signed-off-by: Ameya Parab <[email protected]> Co-authored-by: Ameya Parab <[email protected]> Signed-off-by: Michal Karzynski <[email protected]>

Related Issues/PRs
Close #6119
What changes are proposed in this pull request?
mlflow sagemaker build-and-push-containerHow is this patch tested?
Does this PR change the documentation?
ci/circleci: build_doccheck. If it's successful, proceed to thenext step, otherwise fix it.
Detailson the right to open the job page of CircleCI.Artifactstab.docs/build/html/index.html.Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts: Artifact stores and artifact loggingarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesarea/examples: Example codearea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models: MLmodel format, model serialization/deserialization, flavorsarea/projects: MLproject format, project running backendsarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra: MLflow Tracking server backendarea/tracking: Tracking Service, tracking client APIs, autologgingInterface
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows: Windows supportLanguage
language/r: R APIs and clientslanguage/java: Java APIs and clientslanguage/new: Proposals for new client languagesIntegrations
integrations/azure: Azure and Azure ML integrationsintegrations/sagemaker: SageMaker integrationsintegrations/databricks: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notes