-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Each purge batch and associated records are removed atomically #23381
Conversation
|
||
# Purge each batch with their associated records | ||
# in a transaction to avoid leaving orphaned references. | ||
transaction do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
app/models/mixins/purging_mixin.rb
Outdated
batch_ids = query | ||
end | ||
|
||
unless batch_ids.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert a early break to a simple conditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a problem..in the case of the else condition, we don't want it to execute the query, but instead stay as a scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good eye. I wanted to test that else condition.
break if count == 0 | ||
|
||
total += count | ||
|
||
purge_associated_records(batch_ids) if respond_to?(:purge_associated_records) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can move this line up under count = unscoped.where(:id => batch_ids).delete_all
, and run it regardless of the count that's returned. Then you can end the transaction at that point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I can try it. I was debating removing the guard/conditions entirely for count == 0
for purge_associated_records
. Also batch_ids.empty?
. I honestly am not sure how batch_ids.empty?
is true
. Is it only to guard against running 1 transaction of this, which does nothing:
count = unscoped.where(:id => batch_ids).delete_all
purge_associated_records(batch_ids) if respond_to?(:purge_associated_records)
It feels like it would be safer to do one run of those two lines even if nothing is done to avoid these extra conditionals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tldr; I think it's ok to remove those conditionals.
tldr; + 1: Ok, so it's just a wasted set of queries in the situation in which nothing is purged. I think that's fine to eliminate some complexity for an edge case that should never be a performance issue.
To recreate it, I asked it to purge_remaining with 10 when I had 6 report results, meaning, nothing would be deleted.
ok, with the guard clause to avoid batch_ids.empty?
, it does something like this:
==> log/evm.log <==
[----] I, [2025-03-17T18:00:08.774832#17137:35d4] INFO -- evm: MIQ(MiqReportResult.purge_by_remaining) Purging Miq report results older than last 10 results...
==> log/test.log <==
[----] D, [2025-03-17T18:00:08.775384#17137:35d4] DEBUG -- test: TRANSACTION (0.1ms) SAVEPOINT active_record_1
[----] D, [2025-03-17T18:00:08.776041#17137:35d4] DEBUG -- test: MiqReportResult Pluck (0.7ms) SELECT "id" FROM (SELECT "id", RANK() OVER(
PARTITION BY "miq_report_id"
ORDER BY "id" DESC
) AS "rank"
FROM "miq_report_results") miq_report_results WHERE (rank > 10) LIMIT $1 [["LIMIT", 100]]
[----] D, [2025-03-17T18:00:08.776164#17137:35d4] DEBUG -- test: TRANSACTION (0.1ms) RELEASE SAVEPOINT active_record_1
==> log/evm.log <==
[----] I, [2025-03-17T18:00:08.776199#17137:35d4] INFO -- evm: MIQ(MiqReportResult.purge_by_remaining) Purging Miq report results older than last 10 results...Complete - Deleted 0 records
Now, in the situation in which you will not purging anything anyway, the removal of the conditional will do this:
[----] I, [2025-03-17T17:57:54.677557#16882:35d4] INFO -- evm: MIQ(MiqReportResult.purge_by_remaining) Purging Miq report results older than last 10 results...
==> log/test.log <==
[----] D, [2025-03-17T17:57:54.678205#16882:35d4] DEBUG -- test: TRANSACTION (0.0ms) SAVEPOINT active_record_1
[----] D, [2025-03-17T17:57:54.678472#16882:35d4] DEBUG -- test: MiqReportResult Pluck (0.3ms) SELECT "id" FROM (SELECT "id", RANK() OVER(
PARTITION BY "miq_report_id"
ORDER BY "id" DESC
) AS "rank"
FROM "miq_report_results") miq_report_results WHERE (rank > 10) LIMIT $1 [["LIMIT", 100]]
==> log/evm.log <==
[----] I, [2025-03-17T17:57:54.678526#16882:35d4] INFO -- evm: MIQ(MiqReportResult.purge_in_batches) Purging 0 Miq report results.
==> log/test.log <==
[----] D, [2025-03-17T17:57:54.678733#16882:35d4] DEBUG -- test: MiqReportResult Delete All (0.1ms) DELETE FROM "miq_report_results" WHERE 1=0
[----] D, [2025-03-17T17:57:54.678892#16882:35d4] DEBUG -- test: MiqReportResultDetail Delete All (0.1ms) DELETE FROM "miq_report_result_details" WHERE 1=0
[----] D, [2025-03-17T17:57:54.679013#16882:35d4] DEBUG -- test: TRANSACTION (0.0ms) RELEASE SAVEPOINT active_record_1
==> log/evm.log <==
[----] I, [2025-03-17T17:57:54.679045#16882:35d4] INFO -- evm: MIQ(MiqReportResult.purge_by_remaining) Purging Miq report results older than last 10 results...Complete - Deleted 0 records
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this to test that... (ignoring whitespace)
diff --git a/app/models/mixins/purging_mixin.rb b/app/models/mixins/purging_mixin.rb
index cb56d52f4d..96a41d6274 100644
--- a/app/models/mixins/purging_mixin.rb
+++ b/app/models/mixins/purging_mixin.rb
@@ -209,16 +209,19 @@ module PurgingMixin
# pull back ids - will slow performance
batch_ids = query.pluck(:id)
current_window = batch_ids.size
+ exit_early = true if batch_ids.empty?
else
batch_ids = query
end
+ unless exit_early
_log.info("Purging #{current_window} #{table_name.humanize}.")
count = unscoped.where(:id => batch_ids).delete_all
purge_associated_records(batch_ids) if respond_to?(:purge_associated_records)
total += count
end
+ end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is ready to go I think.
ee0846f
to
a783168
Compare
We wrap each batch deletion in a transaction, ensuring the whole batch is deleted together or not at all. Note, any early breaks out of the loop needed to be removed or moved outside of the transaction to ensure the transaction completes successfully deleting the whole batch or nothing at all. In a demo environment database, we found some miq_report_result_details and binary_blobs were left orphaned as they referenced miq_report_result_id values that no longer existed. The provided spec demonstrates the problem but likely occurred in production if a timeout occurred anywhere in the purge_associated_records.
a783168
to
1afea62
Compare
Closing for now. After discussing this with @kbrock, he had the clever idea that really addresses our concerns but without using a transaction. He suggested purging the associated tables before the parent table. In this way, we can continue to purge tables each run through the purge timer and will only remove from the parent table once all the associated records are removed. |
Alternative to ManageIQ#23381, where each purge batch was done in a transaction. Actally, we only care about eventual consistency, so we can purge the associated records first, and if that completes, we purge the parent table's rows for that batch. This allows us to deal with timeouts or other errors by continuing a prior run by cleaning up associated rows and only removing the parent when there are no more associated rows in other tables that remain.
Alternative to ManageIQ#23381, where each purge batch was done in a transaction. Actally, we only care about eventual consistency, so we can purge the associated records first, and if that completes, we purge the parent table's rows for that batch. This allows us to deal with timeouts or other errors by continuing a prior run by cleaning up associated rows and only removing the parent when there are no more associated rows in other tables that remain.
Alternative to ManageIQ#23381, where each purge batch was done in a transaction. Actually, we only care about eventual consistency, so we can purge the associated records first, and if that completes, we purge the parent table's rows for that batch. This allows us to deal with timeouts or other errors by continuing a prior run by cleaning up associated rows and only removing the parent when there are no more associated rows in other tables that remain.
Alternative to ManageIQ#23381, where each purge batch was done in a transaction. Actually, we only care about eventual consistency, so we can purge the associated records first, and if that completes, we purge the parent table's rows for that batch. This allows us to deal with timeouts or other errors by continuing a prior run by cleaning up associated rows and only removing the parent when there are no more associated rows in other tables that remain. Note, there is no recourse for delete_all that has a failure as it's a vanilla DELETE FROM where ... so it's very unlikely to fail for anything we care about. Conversely, destroy_all instantiates objects, each of which could have callbacks that fail. Additionally, each instantiation takes time and increases the likelihood that if a timeout occurs, it would be in this area. Therefore, we verify the rows are actually destroyed? and bail if not to avoid deleting the parent and leaving orphans in other tables.
Alternative to ManageIQ#23381, where each purge batch was done in a transaction. Actually, we only care about eventual consistency, so we can purge the associated records first, and if that completes, we purge the parent table's rows for that batch. This allows us to deal with timeouts or other errors by continuing a prior run by cleaning up associated rows and only removing the parent when there are no more associated rows in other tables that remain. Note, there is no recourse for delete_all that has a failure as it's a vanilla DELETE FROM where ... so it's very unlikely to fail for anything we care about. Conversely, destroy_all instantiates objects, each of which could have callbacks that fail. Additionally, each instantiation takes time and increases the likelihood that if a timeout occurs, it would be in this area. Therefore, we verify the rows are actually destroyed? and bail if not to avoid deleting the parent and leaving orphans in other tables.
Alternative to ManageIQ#23381, where each purge batch was done in a transaction. Actually, we only care about eventual consistency, so we can purge the associated records first, and if that completes, we purge the parent table's rows for that batch. This allows us to deal with timeouts or other errors by continuing a prior run by cleaning up associated rows and only removing the parent when there are no more associated rows in other tables that remain. Note, there is no recourse for delete_all that has a failure as it's a vanilla DELETE FROM where ... so it's very unlikely to fail for anything we care about. Conversely, destroy_all instantiates objects, each of which could have callbacks that fail. Additionally, each instantiation takes time and increases the likelihood that if a timeout occurs, it would be in this area. Therefore, we verify the rows are actually destroyed? and bail if not to avoid deleting the parent and leaving orphans in other tables.
Alternative to ManageIQ#23381, where each purge batch was done in a transaction. Actually, we only care about eventual consistency, so we can purge the associated records first, and if that completes, we purge the parent table's rows for that batch. This allows us to deal with timeouts or other errors by continuing a prior run by cleaning up associated rows and only removing the parent when there are no more associated rows in other tables that remain. Note, there is no recourse for delete_all that has a failure as it's a vanilla DELETE FROM where ... so it's very unlikely to fail for anything we care about. Conversely, destroy_all instantiates objects, each of which could have callbacks that fail. Additionally, each instantiation takes time and increases the likelihood that if a timeout occurs, it would be in this area. Therefore, we verify the rows are actually destroyed? and bail if not to avoid deleting the parent and leaving orphans in other tables.
Alternative to ManageIQ#23381, where each purge batch was done in a transaction. Actually, we only care about eventual consistency, so we can purge the associated records first, and if that completes, we purge the parent table's rows for that batch. This allows us to deal with timeouts or other errors by continuing a prior run by cleaning up associated rows and only removing the parent when there are no more associated rows in other tables that remain. Note, there is no recourse for delete_all that has a failure as it's a vanilla DELETE FROM where ... so it's very unlikely to fail for anything we care about. Conversely, destroy_all instantiates objects, each of which could have callbacks that fail. Additionally, each instantiation takes time and increases the likelihood that if a timeout occurs, it would be in this area. Therefore, we verify the rows are actually destroyed? and bail if not to avoid deleting the parent and leaving orphans in other tables.
Alternative to ManageIQ#23381, where each purge batch was done in a transaction. Actually, we only care about eventual consistency, so we can purge the associated records first, and if that completes, we purge the parent table's rows for that batch. This allows us to deal with timeouts or other errors by continuing a prior run by cleaning up associated rows and only removing the parent when there are no more associated rows in other tables that remain. Note, there is no recourse for delete_all that has a failure as it's a vanilla DELETE FROM where ... so it's very unlikely to fail for anything we care about. Conversely, destroy_all instantiates objects, each of which could have callbacks that fail. Additionally, each instantiation takes time and increases the likelihood that if a timeout occurs, it would be in this area. Therefore, we verify the rows are actually destroyed? and bail if not to avoid deleting the parent and leaving orphans in other tables.
Alternative to ManageIQ#23381, where each purge batch was done in a transaction. Actually, we only care about eventual consistency, so we can purge the associated records first, and if that completes, we purge the parent table's rows for that batch. This allows us to deal with timeouts or other errors by continuing a prior run by cleaning up associated rows and only removing the parent when there are no more associated rows in other tables that remain. Note, there is no recourse for delete_all that has a failure as it's a vanilla DELETE FROM where ...
Alternative to ManageIQ#23381, where each purge batch was done in a transaction. Actually, we only care about eventual consistency, so we can purge the associated records first, and if that completes, we purge the parent table's rows for that batch. This allows us to deal with timeouts or other errors by continuing a prior run by cleaning up associated rows and only removing the parent when there are no more associated rows in other tables that remain. Note, there is no recourse for delete_all that has a failure as it's a vanilla DELETE FROM where ... so it's very unlikely to fail for anything we care to handle.
We wrap each batch deletion in a transaction, ensuring the whole batch is deleted together or not at all. Note, any early breaks out of the loop needed to be moved outside of the transaction to ensure the transaction completes successfully deleting the whole batch or nothing at all.
In a demo environment database, we found some miq_report_result_details and binary_blobs were left orphaned as they referenced miq_report_result_id values that no longer existed. The provided spec demonstrates the problem but likely occurred in production if a timeout occurred anywhere in the purge_associated_records.