Skip to content

Infer Pandas Series name to Schema Input/output name [FR] #6360

@RynoXLI

Description

@RynoXLI

Willingness to contribute

Yes. I would be willing to contribute this feature with guidance from the MLflow community.

Proposal Summary

Infer Pandas Series name to Schema Input/output name.

With a Pandas series object there is a "name" attribute since v0.4.

See pandas series object name attribute here:
https://pandas.pydata.org/docs/reference/api/pandas.Series.name.html

See when pandas series name was added:
https://pandas.pydata.org/pandas-docs/version/1.0/whatsnew/v0.4.x.html

Currently, column names are being inferred for pandas data frames, but not pandas series.

Motivation

What is the use case for this feature?

Infer pandas series names for schema input/output signatures.

Why is this use case valuable to support for MLflow users in general?

Easier use.

Why is this use case valuable to support for your project(s) or organization?

No need to explicitly name schemas.

Why is it currently difficult to achieve this use case?

Need to write explicit schemas.

Details

Edit this file at line 118: https://github.com/mlflow/mlflow/edit/master/mlflow/types/utils.py

Add in the following code:

elif isinstance(data, pd.Series):
        name = None
        if hasattr(data, "name"):
            name = data.name
        schema = Schema([ColSpec(type=_infer_pandas_column(data), name=name)])

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Metadata

Metadata

Assignees

Labels

area/model-registryModel registry, model registry APIs, and the fluent client calls for model registryenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions