Skip to content

Conversation

@hangfei
Copy link
Collaborator

@hangfei hangfei commented Jun 8, 2022

Feature monitoring for scalar features.

SQL schema:
For numeric features

feature_name | feature_type | mean | median | max | min | coverage |

For string/boolean features:

feature_name | feature_type | cardinality | coverage

coverage = (total_count - missing_count) / total_count
cardinality = count of different items. For example, ["apple", "apple", "orage" ] => cardinatily is 2(apple and orange)

For example:
input feature table:

+----+------+--------+-----+--------------------+---------+
|key0|f_null|f_string|f_int|            f_double|f_boolean|
+----+------+--------+-----+--------------------+---------+
|   1|  null|   apple|    1|  0.7191070126381514|     true|
|   2|  null|   apple|    0|   0.631537113156793|    false|
|   5|  null|  orange|    1|  0.8990245049992188|    false|
|   3|  null|    null|    0|  0.5762385339376767|    false|
|   4|  null|  orange|    0|  0.4366523814297121|    false|
|   8|  null|  orange|    0| 0.14328502026059808|    false|
|  10|  null|  orange|    0|  0.9389259655580797|     true|
|   6|  null|  orange|    0|  0.7877756383579947|    false|
|   7|  null|  orange|    1|0.056606459160327804|    false|
|   9|  null|  orange|    1|  0.6003317209869276|    false|
+----+------+--------+-----+--------------------+---------+
+------------+------------+----------+----+----+----+----+--------+
|feature_name|feature_type|      date|mean| avg| min| max|coverage|
+------------+------------+----------+----+----+----+----+--------+
|      f_null|      double|2022-06-07|null|null|null|null|     0.0|
+------------+------------+----------+----+----+----+----+--------+

+------------+------------+----------+----+----+----+----+--------+
|feature_name|feature_type|      date|mean| avg| min| max|coverage|
+------------+------------+----------+----+----+----+----+--------+
|      f_null|      double|2022-06-07|null|null|null|null|     0.0|
+------------+------------+----------+----+----+----+----+--------+

+------------+------------+----------+----+---+---+---+--------+
|feature_name|feature_type|      date|mean|avg|min|max|coverage|
+------------+------------+----------+----+---+---+---+--------+
|       f_int|     integer|2022-06-07| 0.4|0.4|  0|  1|     1.0|
+------------+------------+----------+----+---+---+---+--------+

+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+
|feature_name|feature_type|      date|              mean|               avg|                 min|               max|coverage|
+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+
|    f_double|      double|2022-06-07|0.5789484350485481|0.5789484350485481|0.056606459160327804|0.9389259655580797|     1.0|
+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+

+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+
|feature_name|feature_type|      date|              mean|               avg|                 min|               max|coverage|
+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+
|    f_double|      double|2022-06-07|0.5789484350485481|0.5789484350485481|0.056606459160327804|0.9389259655580797|     1.0|
+------------+------------+----------+------------------+------------------+--------------------+------------------+--------+

+------------+------------+----------+-----+------+--------+-----------+
|feature_name|feature_type|      date|  min|   max|coverage|cardinality|
+------------+------------+----------+-----+------+--------+-----------+
|    f_string|      string|2022-06-09|apple|orange|     0.9|          3|
+------------+------------+----------+-----+------+--------+-----------+
+------------+------------+----------+-----+----+--------+-----------+
|feature_name|feature_type|      date|  min| max|coverage|cardinality|
+------------+------------+----------+-----+----+--------+-----------+
|   f_boolean|     boolean|2022-06-09|false|true|     1.0|          2|
+------------+------------+----------+-----+----+--------+-----------+

@hangfei hangfei changed the title Monitor Feature Monitoring Jun 8, 2022
xiaoyongzhu
xiaoyongzhu previously approved these changes Jun 12, 2022
@hangfei hangfei merged commit d18294c into main Jun 13, 2022
bozhonghu pushed a commit that referenced this pull request Jun 15, 2022
* main:
  Fixing purview test issues and improve performance (#350)
  [feathr] Add product_recommendation advanced sample (#348)
  obejectId query cmd update (#360)
  add license, release, docs, python api ref badges with shields img (#357)
  quick fix the 404 not found in read me link (#355)
  Python SQL Registry (#311)
  enable JWT token param in frontend API calls (#337)
  Optimize environment variable behavior (#333)
  Adding better warning message to let user know that config file is missing and they need to set env parameters. (#347)
  Feature Monitoring (#330)
  Windoze/211 maven submission (#334)
  Windoze/211 maven submission (#334)
  Windoze/211 maven submission (#334)
  Fix Synapse quickstart link (#346)
  Show feature details when click feature in lineage graph (#339)
  Update pull_request_push_test.yml
  Update UI README for how to create overrides for local development (#335)
  Update databricks quick start experience (#217)
@hangfei hangfei deleted the monitor branch July 29, 2022 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants