Skip to content

Conversation

@ahlag
Copy link
Contributor

@ahlag ahlag commented May 4, 2022

Description

  • Added pretty print flag to build_features

Resolves #163

How was this patch tested?

Ran pytest -s feathr_project/test/test_feature_materialization.py -k "test_build_feature_verbose" to check the pretty-printed features

Does this PR introduce any user-facing changes?

Introduces verbose argument to pretty print features

@hangfei hangfei added the safe to test Tag to execute build pipeline for a PR from forked repo label May 4, 2022
@hangfei
Copy link
Collaborator

hangfei commented May 5, 2022

For pretty print, let's use a util so the logics can be unified and reused.

I am thinking about this format:

These are the features you are going to register/build/materialize:
feature1, feature2, and feature3.

@ahlag
Copy link
Contributor Author

ahlag commented May 5, 2022

@hangfei
Okay sure. Where should I place the util code?

I am receiving the following errors from the CI/CD. Do I need to set Databricks Token on my laptop?

  1. Connection error to Databricks
Run # overwrite corresponding environment variables to utilize feathr to upload the files
  # overwrite corresponding environment variables to utilize feathr to upload the files
  # assuming there will be only one jar in the target folder
  databricks fs cp target/scala-[2](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:2).12/feathr-assembly-0.1.0.jar dbfs:/feathr_jar_github_action_06_42_58/feathr-assembly-0.1.0.jar --overwrite
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME: /opt/hostedtoolcache/Java_Temurin-Hotspot_jdk/8.0.[3](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:3)22-6/x6[4](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:4)
    CI_SPARK_REMOTE_JAR_FOLDER: feathr_jar_github_action_06_42_[5](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:5)8
    FEATHR_LOCAL_JAR_NAME: feathr-assembly-0.1.0.jar
    FEATHR_LOCAL_JAR_FULL_NAME_PATH: target/scala-2.12/feathr-assembly-0.1.0.jar
    pythonLocation: /opt/hostedtoolcache/Python/3.8.12/x[6](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:6)4
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.[8](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:8).[12](https://github.com/linkedin/feathr/runs/6301795147?check_suite_focus=true#step:7:12)/x64/lib
    DATABRICKS_HOST: 
    DATABRICKS_TOKEN: 
Error: IndexError: string index out of range
Error: Process completed with exit code 1.
  1. Connection error to Azure Blob Storage
Run fixpoint/azblob-upload-artifact@v4
Error: Unable to extract accountName with provided information.
Error: Unable to extract accountName with provided information.

@hangfei
Copy link
Collaborator

hangfei commented May 5, 2022

may not be critical. just re-ran the tests to see if it passes

@hangfei
Copy link
Collaborator

hangfei commented May 5, 2022

For util, let's create a new py file in the same direcotry.

@hangfei
Copy link
Collaborator

hangfei commented May 6, 2022

As long as your local test that you added passes, it should be fine. There is a crendential issue on a forked PR(we are fixing it and hopefully it can be fixed in 1-2 weeks).

Since you are just adding some pretty-print, i don't think it will break the e2e tests.

I will ran the e2e tests after it's merged.

@ahlag
Copy link
Contributor Author

ahlag commented May 6, 2022

@hangfei
Ok sure!

@ahlag ahlag changed the title [WIP] Pretty-print the features produced by buildFeatures Pretty-print the features produced by buildFeatures May 8, 2022
@ahlag
Copy link
Contributor Author

ahlag commented May 8, 2022

@hangfei
This PR is up for review.

I tested the verbose flag with the following command:

pytest -s feathr_project/test/test_feature_materialization.py -k "test_build_feature_verbose"                                                                         ─╯
=========================================================================== test session starts ===========================================================================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/changyonglik/Desktop/opensource/feathr/feathr_project
collected 5 items / 4 deselected / 1 selected

feathr_project/test/test_feature_materialization.py 2022-05-08 14:15:15.225 | INFO     | feathr._envvariableutil:get_environment_variable:64 - DATABRICKS_WORKSPACE_TOKEN_VALUE is not set in the environment variables.
2022-05-08 14:15:15.233 | INFO     | feathr._envvariableutil:get_environment_variable:64 - REDIS_PASSWORD is not set in the environment variables.
("request_features is the achor feature of ['trip_distance', "
 "'f_is_long_trip_distance', 'f_day_of_week']>")
"user_embemdding_derived is the derived feature of ['user_embedding']>"
.

@ahlag ahlag force-pushed the build-feature-pretty-print branch 4 times, most recently from 2236307 to 729013d Compare May 8, 2022 08:24
Signed-off-by: Chang Yong Lik <[email protected]>
@ahlag ahlag force-pushed the build-feature-pretty-print branch from 729013d to 88b4549 Compare May 8, 2022 08:33
ahlag added 2 commits May 11, 2022 14:00
Signed-off-by: Chang Yong Lik <[email protected]>
@ahlag ahlag force-pushed the build-feature-pretty-print branch from 743a3e9 to cb6f14b Compare May 13, 2022 15:38
ahlag added 2 commits May 16, 2022 23:56
@ahlag ahlag force-pushed the build-feature-pretty-print branch from c42b064 to 95a6a96 Compare May 16, 2022 15:14
@ahlag
Copy link
Contributor Author

ahlag commented May 16, 2022

@hangfei
I have finished implementing the pretty-prints. Could you take a look? test_get_offline_features_verbose and test_materialize_features_verbose fail because it requires REDIS and KAFKA settings. The pretty-print works without a problem.

❯ pytest -s feathr_project/test/test_feature_materialization.py -k "test_build_feature_verbose"                                                               
====================================================================== test session starts =======================================================================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/changyonglik/Desktop/opensource/feathr/feathr_project
collected 7 items / 6 deselected / 1 selected

feathr_project/test/test_feature_materialization.py 2022-05-17 00:00:06.106 | INFO     | feathr._envvariableutil:get_environment_variable:66 - DATABRICKS_WORKSPACE_TOKEN_VALUE is not set in the environment variables, fetching the value from Key Vault
2022-05-17 00:00:06.115 | INFO     | feathr._envvariableutil:get_environment_variable:66 - REDIS_PASSWORD is not set in the environment variables, fetching the value from Key Vault
("request_features is the achor of ['trip_distance', "
 "'f_is_long_trip_distance', 'f_day_of_week']")
❯ pytest -s feathr_project/test/test_feature_materialization.py -k "test_get_offline_features_verbose"                                                         
====================================================================== test session starts =======================================================================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/changyonglik/Desktop/opensource/feathr/feathr_project
collected 7 items / 6 deselected / 1 selected

feathr_project/test/test_feature_materialization.py 2022-05-16 23:59:32.888 | INFO     | feathr._envvariableutil:get_environment_variable:66 - DATABRICKS_WORKSPACE_TOKEN_VALUE is not set in the environment variables, fetching the value from Key Vault
2022-05-16 23:59:32.896 | INFO     | feathr._envvariableutil:get_environment_variable:66 - REDIS_PASSWORD is not set in the environment variables, fetching the value from Key Vault
Features in feature_query: ['f_location_avg_fare']
❯ pytest -s feathr_project/test/test_feature_materialization.py -k "test_materialize_features_verbose"                                                                                           
======================================================================================= test session starts ========================================================================================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/changyonglik/Desktop/opensource/feathr/feathr_project
collected 7 items / 6 deselected / 1 selected

feathr_project/test/test_feature_materialization.py 2022-05-17 00:13:00.655 | INFO     | feathr._envvariableutil:get_environment_variable:66 - DATABRICKS_WORKSPACE_TOKEN_VALUE is not set in the environment variables, fetching the value from Key Vault
2022-05-17 00:13:00.664 | INFO     | feathr._envvariableutil:get_environment_variable:66 - REDIS_PASSWORD is not set in the environment variables, fetching the value from Key Vault
Yes
Materialization features in settings: ['f_location_avg_fare', 'f_location_max_fare']
2022-05-17 00:13:00.746 | INFO     | feathr._envvariableutil:get_environment_variable:66 - KAFKA_SASL_JAAS_CONFIG is not set in the environment variables, fetching the value from Key Vault

ahlag added 2 commits May 17, 2022 00:18
Signed-off-by: Chang Yong Lik <[email protected]>
Signed-off-by: Chang Yong Lik <[email protected]>
@hangfei
Copy link
Collaborator

hangfei commented May 17, 2022

This looks great! Very impressive test coverage!

@xiaoyongzhu xiaoyongzhu merged commit 1ab7ea2 into feathr-ai:main May 17, 2022
@xiaoyongzhu
Copy link
Member

Thanks for the contribution @ahlag !

@chinmaytredence
Copy link
Contributor

@ahlag Should we put the utils.py inside the feathr directory

@ahlag ahlag deleted the build-feature-pretty-print branch May 18, 2022 03:13
@ahlag
Copy link
Contributor Author

ahlag commented May 18, 2022

@chinmaytredence (cc: @hangfei )
Will the pretty-prints be used elsewhere? (Just confirming)

@chinmaytredence
Copy link
Contributor

@ahlag this is breaking the uvicorn run when we add this feathr project into requirements.txt
Screenshot 2022-05-18 at 9 16 17 AM
t

@ahlag
Copy link
Contributor Author

ahlag commented May 18, 2022

@chinmaytredence
Could you share the steps you used to reproduce this? I will take a look into this.

@chinmaytredence
Copy link
Contributor

chinmaytredence commented May 18, 2022

@ahlag you can start a flask/fastapi project and add this "git+https://github.com/linkedin/feathr.git@main#subdirectory=feathr_project"
into the requirements.txt to use this as a library.
If the utils.py is not inside the "feathr" directory then it will not be downloaded into the dependencies. Thus it breaks the running of application.

@chinmaytredence
Copy link
Contributor

Screenshot 2022-05-18 at 9 22 17 AM

Screenshot 2022-05-18 at 9 22 50 AM

@ahlag
Copy link
Contributor Author

ahlag commented May 18, 2022

@chinmaytredence
Ok. I will create a quick fix and then tag the maintainers and you for a review. Thank you!

@ahlag
Copy link
Contributor Author

ahlag commented May 18, 2022

@chinmaytredence @hangfei
I have created a fix here. Could you review it? I have tested it locally.
#273

@chinmaytredence
Copy link
Contributor

@ahlag validated it. Working now. Thanks for the fix.

bozhonghu pushed a commit that referenced this pull request Jun 1, 2022
* main: (30 commits)
  Yihui/moderate registration conflict (#304)
  Update homepage (#310)
  Add extensible extractor APIs (#302)
  Remove Java and JS from Code Scanning
  Create codeql-analysis.yml
  [feathr] Add API to materialize features to offline store (#294)
  Improve error message when path is not supported (#257)
  Add tech talk slides for Feathr (#296)
  Update README.md
  Add milestone link (#286)
  Fix millisecond timestamp handling (#288)
  Consolidating CI pipelines (#280)
  Fixed dependecy problem of pretty print utils (#273)
  Fixing a broken link in README.md (#277)
  Fix test failure (#276)
  Added feature validation (#258)
  Feathr UI: Display feature key and transform expression in feature detail pages (#262)
  Feathr UI: enable multiple tenant auth (#266)
  Reduce feathr web api docker image build time (#261)
  Pretty-print the features produced by buildFeatures  (#214)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Tag to execute build pipeline for a PR from forked repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pretty-print the features produced by buildFeatures and submit job

5 participants