-
Notifications
You must be signed in to change notification settings - Fork 294
[Model Monitoring] Optimize drift-over-time API for V3IO TSDB
#9135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
danielperezz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great :)
Had two comments
| return mm_schemas.ModelEndpointDriftValues(values=values) | ||
|
|
||
| @staticmethod | ||
| def _convert_drift_data_to_values( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function can be removed now
| if bucket_start_ns not in bucket_endpoint_status: | ||
| bucket_endpoint_status[bucket_start_ns] = {} | ||
|
|
||
| # Update max status for this endpoint in this bucket | ||
| if endpoint_id not in bucket_endpoint_status[bucket_start_ns]: | ||
| bucket_endpoint_status[bucket_start_ns][endpoint_id] = status | ||
| elif status > bucket_endpoint_status[bucket_start_ns][endpoint_id]: | ||
| bucket_endpoint_status[bucket_start_ns][endpoint_id] = status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified using defaultdict and get:
| if bucket_start_ns not in bucket_endpoint_status: | |
| bucket_endpoint_status[bucket_start_ns] = {} | |
| # Update max status for this endpoint in this bucket | |
| if endpoint_id not in bucket_endpoint_status[bucket_start_ns]: | |
| bucket_endpoint_status[bucket_start_ns][endpoint_id] = status | |
| elif status > bucket_endpoint_status[bucket_start_ns][endpoint_id]: | |
| bucket_endpoint_status[bucket_start_ns][endpoint_id] = status | |
| bucket_endpoint_status = defaultdict(dict) # place above instead of the current initializtion | |
| bucket = bucket_endpoint_status[bucket_start_ns] | |
| bucket[endpoint_id] = max(bucket.get(endpoint_id, status), status) |
Optimize the
drift-over-timeAPI for V3IO TSDB. The main issue was that we were iterating over the data multiple times after retrieving it, even though this can be done in a single pass. We also added some optimization in the code logic (see more details below).🛠️ Changes Made
datetimeobjects for bucketing. This also means working in nanoseconds rather than microseconds for better performance during this phase.datetimeconversions only at the very end, after all filtering and processing is done.✅ Checklist
🧪 Testing
TestMonitoringAppFlowpassed (includes testing for drift over time values).🔗 References
🚨 Breaking Changes?
🔍️ Additional Notes