Skip to content

Conversation

@lhayhurst
Copy link
Contributor

See this issue, "Add validation for feature name".

This PR adds a new method to feathr_project/feathr/feature.py, FeatureBase.validate_feature_name, that validates the provided feature name against the following rule:

Only alphabet, numbers, and '_' are allowed in the name. It can not start with numbers. Note that '.' is NOT ALLOWED

Unit tests are provided for this new function, which is called from the FeatureBase constructor before setting the self.name.

I drafted a (now closed) PR against my fork, lhayhurst#1, in order to validate that this PR was headed in the right directi. @hangfei gave me feedback in that PR, which I've incorporated here.

Copy link
Collaborator

@hangfei hangfei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the contribution!

@hangfei hangfei added the safe to test Tag to execute build pipeline for a PR from forked repo label May 17, 2022
@hangfei
Copy link
Collaborator

hangfei commented May 17, 2022

Once the tests passed, we can merge this change.

@lhayhurst
Copy link
Contributor Author

build is failing for reason unrelated to change

   /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/urllib3/util/retry.py:38: DeprecationWarning: Using 'Retry.DEFAULT_METHOD_WHITELIST' is deprecated and will be removed in v2.0. Use 'Retry.DEFAULT_ALLOWED_METHODS' instead
    warnings.warn(

feathr_project/test/test_config_loading.py: 2 warnings
feathr_project/test/test_feature_materialization.py: 3 warnings
feathr_project/test/test_azure_snowflake_e2e.py: 2 warnings
feathr_project/test/test_azure_spark_e2e.py: 3 warnings
feathr_project/test/test_feature_registry.py: 1 warning
feathr_project/test/test_input_output_sources.py: 2 warnings
feathr_project/test/test_pyduf_preprocessing_e2e.py: 5 warnings
  /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/databricks_cli/sdk/api_client.py:75: DeprecationWarning: Using 'method_whitelist' with Retry is deprecated and will be removed in v2.0. Use 'allowed_methods' instead

@hangfei
Copy link
Collaborator

hangfei commented May 17, 2022

The one you shared seems to be warnings.

The root cause is this failed test:

    def test_feathr_online_store_agg_features():
        """
        Test FeathrClient() get_online_features and batch_get can get feature data correctly.
        """
        test_workspace_dir = Path(__file__).parent.resolve() / "test_user_workspace"
    
        client = snowflake_test_setup(os.path.join(test_workspace_dir, "feathr_config.yaml"))
    
        online_test_table = get_online_test_table_name("snowflakeSampleDemoFeature")
    
        backfill_time = BackfillTime(start=datetime(2020, 5, 20), end=datetime(2020, 5, 20), step=timedelta(days=1))
        redisSink = RedisSink(table_name= online_test_table)
        settings = MaterializationSettings(name="snowflakeSampleDemoFeature",
                                       sinks=[redisSink],
                                       feature_names=['f_snowflake_call_center_division_name',
                                                      'f_snowflake_call_center_zipcode'],
                                       backfill_time=backfill_time)
        client.materialize_features(settings)
        # just assume the job is successful without validating the actual result in Redis. Might need to consolidate
        # this part with the test_feathr_online_store test case
        client.wait_job_to_finish(timeout_sec=900)
    
        res = client.get_online_features(online_test_table, '1',
                                         ['f_snowflake_call_center_division_name', 'f_snowflake_call_center_zipcode'])
    
        assert len(res) == 2
        assert res[0] != None
        assert res[1] != None
        res = client.multi_get_online_features('snowflakeSampleDemoFeature',
                                        ['1', '2'],
                                        ['f_snowflake_call_center_division_name', 'f_snowflake_call_center_zipcode'])
>       assert res['1'][0] != None
E       assert None != None

I am re-running the tests to see if it's transistent.

@hangfei
Copy link
Collaborator

hangfei commented May 17, 2022

The feature name f_snowflake_call_center_division_name and f_snowflake_call_center_zipcode seems good. It should pass your validation. Doesn't seem to be related to your change.

@hangfei
Copy link
Collaborator

hangfei commented May 17, 2022

It seems the failure exist in main branch. Let me do some research to fix it.

@lhayhurst
Copy link
Contributor Author

lhayhurst commented May 18, 2022

It seems the failure exist in main branch. Let me do some research to fix it.

Sounds good. I will merge main branch into this PR branch once the main build is green.

@xiaoyongzhu xiaoyongzhu merged commit 2c5dfc0 into feathr-ai:main May 18, 2022
@xiaoyongzhu
Copy link
Member

xiaoyongzhu commented May 18, 2022

@lhayhurst I looked thru the test failures and they are unrelated with the changes that you made. I have a small PR for those (#276) in case you are interested in.

I went ahead and merged the code. Thanks for the contribution!

bozhonghu pushed a commit that referenced this pull request Jun 1, 2022
* main: (30 commits)
  Yihui/moderate registration conflict (#304)
  Update homepage (#310)
  Add extensible extractor APIs (#302)
  Remove Java and JS from Code Scanning
  Create codeql-analysis.yml
  [feathr] Add API to materialize features to offline store (#294)
  Improve error message when path is not supported (#257)
  Add tech talk slides for Feathr (#296)
  Update README.md
  Add milestone link (#286)
  Fix millisecond timestamp handling (#288)
  Consolidating CI pipelines (#280)
  Fixed dependecy problem of pretty print utils (#273)
  Fixing a broken link in README.md (#277)
  Fix test failure (#276)
  Added feature validation (#258)
  Feathr UI: Display feature key and transform expression in feature detail pages (#262)
  Feathr UI: enable multiple tenant auth (#266)
  Reduce feathr web api docker image build time (#261)
  Pretty-print the features produced by buildFeatures  (#214)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Tag to execute build pipeline for a PR from forked repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants