Merge master into branch-2.0#6910
Conversation
Signed-off-by: harupy <[email protected]>
* update to run profiles on the first 100 dataframe columns Signed-off-by: Sunish Sheth <[email protected]> * Adding replace=True so we can sample rows twice Signed-off-by: Sunish Sheth <[email protected]> * Removing replace=true and fixing how the max_rows are calculated Signed-off-by: Sunish Sheth <[email protected]>
… containing transformers (mlflow#6230) * Fix with test case Signed-off-by: dbczumar <[email protected]> * Test fix Signed-off-by: dbczumar <[email protected]> * Fix and simplification Signed-off-by: dbczumar <[email protected]> * more fixes & better test coveragE Signed-off-by: dbczumar <[email protected]> * test Signed-off-by: dbczumar <[email protected]> * LGBM test Signed-off-by: dbczumar <[email protected]> * Format Signed-off-by: dbczumar <[email protected]> * Address comments Signed-off-by: dbczumar <[email protected]> * Fixes Signed-off-by: dbczumar <[email protected]> * Use py37-compatible call args syntax Signed-off-by: dbczumar <[email protected]>
Signed-off-by: harupy <[email protected]>
* Log runtime pipeline config to MLflow Signed-off-by: Jin Zhang <[email protected]> * Fixed lint Signed-off-by: Jin Zhang <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
) Signed-off-by: Sunish Sheth <[email protected]>
* Added in name for ColSpec of pandas datatype infer panda series name for ColSpec Signed-off-by: Ryan Fogle <[email protected]> * Added test for Series name inference added one more pytest condition in test_schema:test_schema_inference_on_pandas_series Signed-off-by: Ryan Fogle <[email protected]> * Update test_schema.py Added in one more assertion to test schema inference. Signed-off-by: Ryan Fogle <[email protected]> * ran black Signed-off-by: Ryan Fogle <[email protected]> * added in requested changes in issue mlflow#6361 Signed-off-by: Ryan Fogle <[email protected]> * fix typo. changed hasattr to getattr Signed-off-by: Ryan Fogle <[email protected]>
Signed-off-by: Brian Barnes <[email protected]>
* remove Signed-off-by: apurva-koti <[email protected]> * fix Signed-off-by: apurva-koti <[email protected]>
Signed-off-by: harupy <[email protected]>
…odule (mlflow#6375) * Migrate GCS environment variables to mlflow.environment_variables Signed-off-by: harupy <[email protected]> * add default Signed-off-by: harupy <[email protected]> * improve comments Signed-off-by: harupy <[email protected]>
…ut/output dataframe contains unsupported data types (mlflow#6365) * Avoid logging signatures Signed-off-by: harupy <[email protected]> * test Signed-off-by: harupy <[email protected]> * fix indent Signed-off-by: harupy <[email protected]> * check model output Signed-off-by: harupy <[email protected]> * fix warning message Signed-off-by: harupy <[email protected]> * todo Signed-off-by: harupy <[email protected]> * update test and doc Signed-off-by: harupy <[email protected]> * fix docstring Signed-off-by: harupy <[email protected]> * fix failed tests Signed-off-by: harupy <[email protected]> * improve _assert_autolog_infers_model_signature_correctly Signed-off-by: harupy <[email protected]>
* refacor set_matrix.py Signed-off-by: harupy <[email protected]> * fix Signed-off-by: harupy <[email protected]> * fix tests Signed-off-by: harupy <[email protected]> * tset for --no-dev Signed-off-by: harupy <[email protected]> * install pytest and pytest-cov Signed-off-by: harupy <[email protected]> * test for --changed-files Signed-off-by: harupy <[email protected]> * add pyyaml to requirements Signed-off-by: harupy <[email protected]> * fix is_matrix_empty Signed-off-by: harupy <[email protected]> * use raw string and re.match Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
…mlflow#5897) * allow mlflow-skinny's autolog() to succeed when no scipy is installed Signed-off-by: Hannes Schulz <[email protected]> * optional pyfunc scipy dependency Signed-off-by: Hannes Schulz <[email protected]> * fix linter-discovered issues Signed-off-by: Hannes Schulz <[email protected]> * add test case, autolog() should pass w/o scipy installed Signed-off-by: Hannes Schulz <[email protected]> * minor cleanup Signed-off-by: Hannes Schulz <[email protected]> * test creating _Example with / without scipy present Signed-off-by: Hannes Schulz <[email protected]> * docs: do not warn about sparse matrix classes Signed-off-by: Hannes Schulz <[email protected]> * linting Signed-off-by: Hannes Schulz <[email protected]> * address review comment Signed-off-by: Hannes Schulz <[email protected]> * run black Signed-off-by: Hannes Schulz <[email protected]> * allow mlflow-skinny's autolog() to succeed when no scipy is installed Signed-off-by: Hannes Schulz <[email protected]> * optional pyfunc scipy dependency Signed-off-by: Hannes Schulz <[email protected]> * fix linter-discovered issues Signed-off-by: Hannes Schulz <[email protected]> * add test case, autolog() should pass w/o scipy installed Signed-off-by: Hannes Schulz <[email protected]> * minor cleanup Signed-off-by: Hannes Schulz <[email protected]> * test creating _Example with / without scipy present Signed-off-by: Hannes Schulz <[email protected]> * docs: do not warn about sparse matrix classes Signed-off-by: Hannes Schulz <[email protected]> * linting Signed-off-by: Hannes Schulz <[email protected]> * address review comment Signed-off-by: Hannes Schulz <[email protected]> * run black Signed-off-by: Hannes Schulz <[email protected]> * remove redundant None check after import Signed-off-by: Hannes Schulz <[email protected]> * delete comment Signed-off-by: Hannes Schulz <[email protected]> * Revert "remove redundant None check after import" This reverts commit f2e198c. Signed-off-by: Hannes Schulz <[email protected]> * review comments Signed-off-by: Hannes Schulz <[email protected]> * remove redundant test Signed-off-by: Hannes Schulz <[email protected]> * Apply suggestions from code review Co-authored-by: Harutaka Kawamura <[email protected]> Signed-off-by: Hannes Schulz <[email protected]> * Autoformat: https://github.com/mlflow/mlflow/actions/runs/2779791161 Signed-off-by: mlflow-automation <[email protected]> Co-authored-by: Harutaka Kawamura <[email protected]> Co-authored-by: dbczumar <[email protected]> Co-authored-by: mlflow-automation <[email protected]>
* revert Signed-off-by: harupy <[email protected]> * workaround for test_prometheus_exporter Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]>
* add failing test Signed-off-by: harupy <[email protected]> * use _validate_metric Signed-off-by: harupy <[email protected]> * fix invalid metric timestamp Signed-off-by: harupy <[email protected]> * fix docstring Signed-off-by: harupy <[email protected]> * fix java test Signed-off-by: harupy <[email protected]>
…lflow#6333) * init Signed-off-by: Weichen Xu <[email protected]> * add test Signed-off-by: Weichen Xu <[email protected]> * update Signed-off-by: Weichen Xu <[email protected]> * update Signed-off-by: Weichen Xu <[email protected]> * fix BINARY conditions in MySQL Signed-off-by: harupy <[email protected]> * fix attr error Signed-off-by: harupy <[email protected]> Co-authored-by: harupy <[email protected]>
…lflow#6390) * Run Model Registry SQLAlchemy tests in postgres, mysql, and mssql Signed-off-by: harupy <[email protected]> * fix failed tests Signed-off-by: harupy <[email protected]> * remove uuid Signed-off-by: harupy <[email protected]> * clean up Signed-off-by: harupy <[email protected]> * do not drop tables Signed-off-by: harupy <[email protected]>
…condition (mlflow#6388) * Fix Signed-off-by: dbczumar <[email protected]> * Fix Signed-off-by: dbczumar <[email protected]> * Format Signed-off-by: dbczumar <[email protected]>
* Added a custom filter from_json to render_and_merge_yaml Signed-off-by: Jin Zhang <[email protected]> * Fixed test Signed-off-by: Jin Zhang <[email protected]> * Update tests/utils/test_file_utils.py Co-authored-by: Siddharth Murching <[email protected]> * Update tests/utils/test_file_utils.py Co-authored-by: Siddharth Murching <[email protected]> * Fixed test Signed-off-by: Jin Zhang <[email protected]> * Fixed windows test attempt Signed-off-by: Jin Zhang <[email protected]> * Fixed windows test Signed-off-by: Jin Zhang <[email protected]> Co-authored-by: Siddharth Murching <[email protected]>
* fix ne operator and improve tests Signed-off-by: harupy <[email protected]> * add () to other operators Signed-off-by: harupy <[email protected]>
* Simplify the documentation section in the PR template Signed-off-by: harupy <[email protected]> * fix Signed-off-by: harupy <[email protected]>
* Added delete_time field to runs table Signed-off-by: Jason Cheng <[email protected]> * Updated latest_schema.sql to pass tests/store/tracking/test_sqlalchemy_store_schema.py Signed-off-by: Jason Cheng <[email protected]> * Added delete_time column into mlflow/store/tracking/dbmodels/initial_models.py Signed-off-by: Jason Cheng <[email protected]> * Moved the docstring to the end of the declaration instead of the start Signed-off-by: Jason Cheng <[email protected]> * Regenerated protobuf for RunInfo new schema, and passed pytest Signed-off-by: Jason Cheng <[email protected]> * Added --older-than flag in mlflow gc command, and added tests for that cli command Signed-off-by: Jason Cheng <[email protected]> * Removed delete_time from RunInfo proto Signed-off-by: Jason Cheng <[email protected]> * Set default older-than flag to None Signed-off-by: Jason Cheng <[email protected]> * Remove delete_time from RunInfo, add optional parameter older_than to store.get_deleted_runs(), and write delete_time into yaml file for file storage instead of adding it to RunInfo Signed-off-by: Jason Cheng <[email protected]> * Remove commented delete_time in RunInfo constructor in models.py Signed-off-by: Jason Cheng <[email protected]> * added INVALID_PARAMETER_VALUE error code in mlflow gc Signed-off-by: Jason Cheng <[email protected]> * Resolve case of delete time not in meta.yaml file, edited overwrite_run_info and gc Signed-off-by: Jason Cheng <[email protected]> * Rename delete_time to deleted_time Signed-off-by: Jason Cheng <[email protected]> * Rename delete_time to deleted_time, match error message in cli tests, filter SqlRuns by deleted_time, indicate float values accepted for gc --older-than flag Signed-off-by: Jason Cheng <[email protected]> * Rename delete_time to deleted_time, fixed assert statement in test_mlflow_gc, and fixed deleted_time queries in test_file_store.py and test_sqlalchemy_store.py Signed-off-by: Jason Cheng <[email protected]> * Only check if is in meta after restoring run in test_delte_restore_run Signed-off-by: Jason Cheng <[email protected]> * Fixed lint errors Signed-off-by: Jason Cheng <[email protected]> * Install mlflow from repository root Signed-off-by: harupy <[email protected]> Co-authored-by: harupy <[email protected]>
* Separate protos job Signed-off-by: harupy <[email protected]> * test Signed-off-by: harupy <[email protected]> * Add paths Signed-off-by: harupy <[email protected]> * Revert "test" This reverts commit 5d04b02. Signed-off-by: harupy <[email protected]>
* use gt operator Signed-off-by: harupy <[email protected]> * fix sqlalchemy store Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
| @@ -6,16 +6,43 @@ R API | |||
|
|
|||
There was a problem hiding this comment.
Will apply the autogenerated patch from CI
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
| expect_null(order_limit_result2$next_page_token) | ||
| }) | ||
|
|
||
| test_that("mlflow_search_experiments() works properly", { |
There was a problem hiding this comment.
Is this test a duplicate? We have the same test right above.
There was a problem hiding this comment.
Indeed, it is! Removed!
mlflow/utils/search_utils.py
Outdated
| ttype_for_timestamp = ( | ||
| TokenType.Name.Builtin | ||
| if Version(sqlparse.__version__) >= Version("0.4.3") | ||
| else TokenType.keyword |
There was a problem hiding this comment.
| else TokenType.keyword | |
| else TokenType.Keyword |
to include #6901
There was a problem hiding this comment.
Awesome! Included!
mlflow/pipelines/utils/step.py
Outdated
| from packaging import version | ||
|
|
||
| dbr_version_image_key = get_databricks_runtime() | ||
| dbr_version = dbr_version_image_key.split("-")[0] | ||
| if version.parse(dbr_version) < version.parse("11.x"): |
There was a problem hiding this comment.
This block may not work properly because packaging only works for python package versions.
There was a problem hiding this comment.
>>> from packaging import version
>>> version.parse("10.0") < version.parse("11.x")
FalseThere was a problem hiding this comment.
Does dbr_version always look like <major>.<minor>.x?
There was a problem hiding this comment.
Great catch! Updated to if LooseVersion(dbr_version).version[0] < 11:, which seems to work
There was a problem hiding this comment.
if int(dbr_version_image_key.split(".")[0]) < 11:also works (as long as dbr_version_image_key matches ^\d+\.). We can avoid using deprecated distutils. https://peps.python.org/pep-0632/#motivation
| def estimator_fn(estimator_params={}): # pylint: disable=dangerous-default-value | ||
| from sklearn.linear_model import SGDRegressor | ||
|
|
||
| return SGDRegressor(random_state=42, **estimator_params) |
There was a problem hiding this comment.
This looks like a good first issue for a first-time contributor. In this particular case, it's safe, but we normally should not use a mutable object as a default value.
| def estimator_fn(estimator_params={}): # pylint: disable=dangerous-default-value | |
| from sklearn.linear_model import SGDRegressor | |
| return SGDRegressor(random_state=42, **estimator_params) | |
| def estimator_fn(estimator_params=None): # pylint: disable=dangerous-default-value | |
| from sklearn.linear_model import SGDRegressor | |
| return SGDRegressor(random_state=42, **(estimator_params or {})) |
There was a problem hiding this comment.
We don't need to fix this in this PR.
tests/pipelines/test_train_step.py
Outdated
| @pytest.mark.skipif("hyperopt" not in sys.modules, reason="requires hyperopt to be installed") | ||
| def test_search_space(tmp_pipeline_root_path): |
There was a problem hiding this comment.
I think this test is always skipped. hyperopt is lazily imported and it's not in sys.modules when pytest is collecting tests.
There was a problem hiding this comment.
tests/pipelines/test_train_step.py::test_train_step_with_tuning_output_yaml_correct[False-3-0] PASSED [ 76%]
tests/pipelines/test_train_step.py::test_train_step_with_tuning_child_runs_and_early_stop PASSED [ 76%]
tests/pipelines/test_train_step.py::test_search_space SKIPPED (requi...) [ 77%]
tests/pipelines/test_train_step.py::test_tuning_param_equal[1-1] PASSED [ 77%]
There was a problem hiding this comment.
This test should be run on CI as well, right?
There was a problem hiding this comment.
Maybe we could just get rid of pytest.mark.skipif as test-requirements.txt contains hyperopt.
There was a problem hiding this comment.
Great catch, @harupy @BenWilson2 ! Removed the skipif.
harupy
left a comment
There was a problem hiding this comment.
Left some comments, otherwise LGTM!
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
dbczumar
left a comment
There was a problem hiding this comment.
@harupy @BenWilson2 Thank you for the thorough review! I've addressed your comments :)
mlflow/pipelines/utils/step.py
Outdated
| from packaging import version | ||
|
|
||
| dbr_version_image_key = get_databricks_runtime() | ||
| dbr_version = dbr_version_image_key.split("-")[0] | ||
| if version.parse(dbr_version) < version.parse("11.x"): |
There was a problem hiding this comment.
Great catch! Updated to if LooseVersion(dbr_version).version[0] < 11:, which seems to work
mlflow/utils/search_utils.py
Outdated
| ttype_for_timestamp = ( | ||
| TokenType.Name.Builtin | ||
| if Version(sqlparse.__version__) >= Version("0.4.3") | ||
| else TokenType.keyword |
There was a problem hiding this comment.
Awesome! Included!
| def estimator_fn(estimator_params={}): # pylint: disable=dangerous-default-value | ||
| from sklearn.linear_model import SGDRegressor | ||
|
|
||
| return SGDRegressor(random_state=42, **estimator_params) |
tests/pipelines/test_train_step.py
Outdated
| @pytest.mark.skipif("hyperopt" not in sys.modules, reason="requires hyperopt to be installed") | ||
| def test_search_space(tmp_pipeline_root_path): |
There was a problem hiding this comment.
Great catch, @harupy @BenWilson2 ! Removed the skipif.
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
mlflow/pipelines/utils/step.py
Outdated
| from distutils.version import LooseVersion | ||
|
|
||
| dbr_version_image_key = get_databricks_runtime() | ||
| dbr_version = dbr_version_image_key.split("-")[0] | ||
| if LooseVersion(dbr_version).version[0].major < 11: |
There was a problem hiding this comment.
Can we avoid using distutils because it's deprecated?
https://peps.python.org/pep-0632
| from distutils.version import LooseVersion | |
| dbr_version_image_key = get_databricks_runtime() | |
| dbr_version = dbr_version_image_key.split("-")[0] | |
| if LooseVersion(dbr_version).version[0].major < 11: | |
| dbr_version_image_key = get_databricks_runtime() | |
| dbr_version = dbr_version_image_key.split("-")[0] | |
| if int(dbr_version.split(".")[0]) < 11: |
(Assuming dbr_version starts with <major_version>.)
There was a problem hiding this comment.
Sounds good! Didn't realize distutils was deprecated.
mlflow/utils/mlflow_tags.py
Outdated
| for tag in tags: | ||
| if tag.key == MLFLOW_RUN_NAME: | ||
| return tag.value | ||
| return None |
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
What changes are proposed in this pull request?
Merge master into branch-2.0
How is this patch tested?
Existing tests
Does this PR change the documentation?
Detailslink on thePreview docscheck.Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts: Artifact stores and artifact loggingarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesarea/examples: Example codearea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models: MLmodel format, model serialization/deserialization, flavorsarea/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templatesarea/projects: MLproject format, project running backendsarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra: MLflow Tracking server backendarea/tracking: Tracking Service, tracking client APIs, autologgingInterface
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows: Windows supportLanguage
language/r: R APIs and clientslanguage/java: Java APIs and clientslanguage/new: Proposals for new client languagesIntegrations
integrations/azure: Azure and Azure ML integrationsintegrations/sagemaker: SageMaker integrationsintegrations/databricks: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notes