Skip to content

Conversation

@lhayhurst
Copy link
Owner

See this issue, "Add validation for feature name".

This PR adds a new method to feathr_project/feathr/feature.py, FeatureBase.validate_feature_name, that validates the provided feature name against the following rule:

Only alphabet, numbers, and '_' are allowed in the name. It can not start with numbers. Note that '.' is NOT ALLOWED

Unit tests are provided for this new function, which is called from the FeatureBase constructor before setting the self.name.

@lhayhurst lhayhurst self-assigned this May 8, 2022
@@ -1,13 +1,20 @@
from abc import ABC, abstractmethod
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these imports were unused.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. please remove them.

from feathr.typed_key import DUMMY_KEY, TypedKey


class FeatureNameValidationError(ValueError):
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find a standard feathr exception base class when browsing the code, so I created this new exception. Happy to just raise a ValueError or somesuch as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some new exceptions added for our FeatureRegistry API: https://github.com/linkedin/feathr/blob/main/feathr_project/feathr/api/app/core/feathr_api_exception.py.

For the exception for client, we should create a new one. So we should add the exception you created. Thanks.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

windoze and others added 10 commits May 8, 2022 16:55
* Missing required options

* Add SQL test case
* Update test_feature_registry.py

* Update _feature_registry.py

* fix issues in `get_features_from_registry` API

* update get feature registry  API

* add `feature_config`

* update get key

* resove comments

* Update docs and resolve comments

* address test issues

* fix test

* fix test issues

* update test

* fix test

* remove print

* update typedef
* Squash commit for api_demo

* Modify rst file for adding api module.

* Fix merge conflicts

* Update docker build to install feathr package from local instead of remote github repo

* Expose Purview Backend to API

* Fix docker build error

* Revert rst change

* Update web ui to fetch project list from server side, no longer use hard coded project name in api calls

* Update api version

* Fix typo

Co-authored-by: Yihui Guo <[email protected]>
Co-authored-by: Blair Chen <[email protected]>
This PR allows feathr web ui app to be deployed as a docker container.

Closes feathr-ai#240

* Add docker file for feathr web ui app
* Add instruction for building docker image
* Fix test_feathr_register_features_e2e
* added installing librdkafka section

* grammar
* feathr-ai#51 Azure Synapse Spark Driver Log

* feathr-ai#51 Error Log on Failure

* Keep same API for both Synapse and Databricks

* Duplicated import
@@ -1,13 +1,20 @@
from abc import ABC, abstractmethod
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. please remove them.

from feathr.typed_key import DUMMY_KEY, TypedKey


class FeatureNameValidationError(ValueError):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some new exceptions added for our FeatureRegistry API: https://github.com/linkedin/feathr/blob/main/feathr_project/feathr/api/app/core/feathr_api_exception.py.

For the exception for client, we should create a new one. So we should add the exception you created. Thanks.

def validate_feature_name(cls, feature_name: str) -> bool:
"""
Only alphabet, numbers, and '_' are allowed in the name.
It can not start with numbers. Note that '.' is NOT ALLOWED!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add the reasons why "." is not allowed. Some compute engines, like Spark, will consider "." as a operator in the feature name.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 65 to 73
if str.isnumeric(feature_name[0]):
raise FeatureNameValidationError(
f'Feature name rule violation: feature name cannot start with a number; name={feature_name}')

feature_validator = re.compile(r'^[a-zA-Z0-9_]+$')

if not feature_validator.match(feature_name):
raise FeatureNameValidationError(
f'Feature name rule violation: only letters, numbers, and underscores are allowed in the name; name={feature_name}')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you try to combine them together and add a documentation for the regex. Let me know if you need help to come up with a regex. I find this useful when i figure out my regex: https://www.regextester.com/

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +8 to +10
@pytest.mark.parametrize('bad_feature_name',
[None,
''])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add test for "."

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is actually just testing for empty or None feature names.

def test_feature_name_fails_on_punctuation_chars():
    for char in set(string.punctuation) - set('_'):
        with pytest.raises(FeatureNameValidationError,  match="only letters, numbers, and underscores are allowed"):
            FeatureBase.validate_feature_name(f"feature_{char}_name")

This test checks for "." (which is part of string.punctuation).

@lhayhurst
Copy link
Owner Author

I'm going to close this PR, as its against my fork, and open it up against the linkedin remote.

@lhayhurst
Copy link
Owner Author

@hangfei , I've issued a PR against the linkedin/feathr remote.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants