Willingness to contribute
Yes. I would be willing to contribute this feature with guidance from the MLflow community.
Proposal Summary
Adding more functionality to mlflow gc, where administrators can delete runs based on time criteria (e.g. delete runs that have been tagged as deleted > 1 month ago). This could be done by adding a delete_date column into runs table in the sqlstore.
Motivation
What is the use case for this feature?
Sometimes developers would want to restore their runs that they previously deleted, so just running mlflow gc to delete all deleted runs periodically might cause developers to lose their data. By being able to delete runs that have been deleted for some time, we can be more sure that the deleted run is no longer needed.
Why is this use case valuable to support for MLflow users in general?
Sometimes developers would want to restore their runs that they previously deleted, so just running mlflow gc to delete all deleted runs periodically might cause developers to lose their data. By being able to delete runs that have been deleted for some time, we can be more sure that the deleted run is no longer needed.
Why is this use case valuable to support for your project(s) or organization?
Sometimes developers would want to restore their runs that they previously deleted, so just running mlflow gc to delete all deleted runs periodically might cause developers to lose their data. By being able to delete runs that have been deleted for some time, we can be more sure that the deleted run is no longer needed.
Why is it currently difficult to achieve this use case?
runs table only have start_time and end_time column that tracks when the run is created and finished. However, it does not track when the run is deleted so it is impossible to delete based on this criteria.
Details
No response
What component(s) does this bug affect?
What interface(s) does this bug affect?
What language(s) does this bug affect?
What integration(s) does this bug affect?
Willingness to contribute
Yes. I would be willing to contribute this feature with guidance from the MLflow community.
Proposal Summary
Adding more functionality to
mlflow gc, where administrators can delete runs based on time criteria (e.g. delete runs that have been tagged asdeleted> 1 month ago). This could be done by adding adelete_datecolumn intorunstable in the sqlstore.Motivation
Sometimes developers would want to restore their runs that they previously deleted, so just running
mlflow gcto delete alldeletedruns periodically might cause developers to lose their data. By being able to delete runs that have been deleted for some time, we can be more sure that the deleted run is no longer needed.Sometimes developers would want to restore their runs that they previously deleted, so just running
mlflow gcto delete alldeletedruns periodically might cause developers to lose their data. By being able to delete runs that have been deleted for some time, we can be more sure that the deleted run is no longer needed.Sometimes developers would want to restore their runs that they previously deleted, so just running
mlflow gcto delete alldeletedruns periodically might cause developers to lose their data. By being able to delete runs that have been deleted for some time, we can be more sure that the deleted run is no longer needed.runstable only havestart_timeandend_timecolumn that tracks when the run is created and finished. However, it does not track when the run is deleted so it is impossible to delete based on this criteria.Details
No response
What component(s) does this bug affect?
area/artifacts: Artifact stores and artifact loggingarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesarea/examples: Example codearea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models: MLmodel format, model serialization/deserialization, flavorsarea/projects: MLproject format, project running backendsarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra: MLflow Tracking server backendarea/tracking: Tracking Service, tracking client APIs, autologgingWhat interface(s) does this bug affect?
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows: Windows supportWhat language(s) does this bug affect?
language/r: R APIs and clientslanguage/java: Java APIs and clientslanguage/new: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/azure: Azure and Azure ML integrationsintegrations/sagemaker: SageMaker integrationsintegrations/databricks: Databricks integrations