Skip to content

[BUG] get_offline_feature ignores parquet output file option #716

@loomlike

Description

@loomlike

Willingness to contribute

No. I cannot contribute a bug fix at this time.

Feathr version

0.8.0

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0): both on Linux Ubuntu 20 and Databricks
  • Python version: 3.10
  • Spark version, if reporting runtime issue:

Describe the problem

get_offline_feature always write into avro regardless of the execution config.

Tracking information

No response

Code to reproduce bug

Run:

get_offline_feature(
    execution_configurations=SparkExecutionConfiguration({
        "spark.feathr.inputFormat": "parquet",
        "spark.feathr.outputFormat": "parquet",
    }),
    ....
)

still write file as avro

What component(s) does this bug affect?

  • Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • Feature Registry Web UI: The Web UI for feature registry. Written in React

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions