Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

django.db.utils.OperationalError: (1114, "The table 'huey_monitor_taskprogressmodel' is full") #46

Closed
formacube opened this issue Aug 21, 2021 · 8 comments

Comments

@formacube
Copy link

Hi @jedie,

I had a bad surprise this morning while checking on the progress of a process which had it's first sub-process running for over than 24h.

There was an error: django.db.utils.OperationalError: (1114, "The table 'huey_monitor_taskprogressmodel' is full")

Then I realized that for each tiny progress report, a record is created for TaskProgressModel

Could you please tell me what is the benefit of creating a new TaskProgressModel (even two when you want to report progress on both sub-task and maintask) each time you report a progress vs. incrementing directly the progress_count field of an existing TaskProgressModel ?

That would mean that in general, we would have only one TaskProgressModel for each TaskModel.

Thanks in advance for your feedback

@jedie
Copy link
Collaborator

jedie commented Aug 25, 2021

Thats a good point. The initial idea was to to avoid database locking mechanisms. If it runs in parallel. But I think that should be rethought.

@formacube
Copy link
Author

Since 3 days I'm using this minimal rewrite (to keep maximum compatibility with your existing code):


    def update(self, n=1):
        """
        Create a TaskProgressModel instance to main and sub tasks
        to store the progress information.
        """
        self.total_progress += n

        now = timezone.now()
        ids = [self.task.id]
        objects = [
            TaskProgressModel.objects.get_or_create(
                task_id=self.task.id)[0]
            # TaskProgressModel(
            #     task_id=self.task.id,
            #     progress_count=n,
            #     create_dt=now
            # )
        ]

        if self.parent_task_id:
            # Store information for main task, too:
            ids.append(self.parent_task_id)
            
            if self.cumulate_w_parent_progress:
                objects.append(
                    TaskProgressModel.objects.get_or_create(
                    task_id=self.parent_task_id)[0]
                )

        # TaskProgressModel.objects.bulk_create(objects)
        for obj in objects:
            if obj.progress_count:
                obj.progress_count += n
            else:
                obj.progress_count = n
            obj.save()

        # Update the last change date times:
        TaskModel.objects.filter(task_id__in=ids).update(
            update_dt=now
        )

@formacube
Copy link
Author

Can we move on with this PR (#44)?
It is also straightforward.

As I don't really understand what you would expect through your comment

Expand the README is fine... On the other hand, this has a general problem: it is not guaranteed that the code works, because it is untested. Actually, it is better to add example code and refer to it. Because they can be added to the tests.

Maybe we should enhance https://github.com/boxine/django-huey-monitor/blob/master/huey_monitor_tests/test_app/tasks.py and add more DocStrings there and make more links from README to it?

I cannot do anything further unless you give me more precise instructions in a not technical language that I can understand (see my answer from 1 month ago).

Kind regards

@jedie
Copy link
Collaborator

jedie commented Jan 3, 2022

@formacube Maybe you want to share your opinion here: #57 That will fix this issues.

jedie pushed a commit that referenced this issue Jan 3, 2022
Remove the `TaskProgressModel` model that was used to store process information. But this has some
disadvantages:

* Every `process_info.update()` call creates one `TaskProgressModel` instance and this model was
never cleaned. So if many items was processed, the table may be get full. So this PR fixed #46
* If many small `process_info.update()` calls happens, then we get a high database load

Refactor that completely by using the Django cache to store progress information.
If a task has been finished: Transfer the information from cache to database.
@jedie
Copy link
Collaborator

jedie commented Jan 3, 2022

will be fixed by: #58

@jedie jedie closed this as completed Jan 3, 2022
jedie pushed a commit that referenced this issue Jan 7, 2022
Remove the `TaskProgressModel` model that was used to store process information. But this has some
disadvantages:

* Every `process_info.update()` call creates one `TaskProgressModel` instance and this model was
never cleaned. So if many items was processed, the table may be get full. So this PR fixed #46
* If many small `process_info.update()` calls happens, then we get a high database load

Refactor that completely by using the Django cache to store progress information.
If a task has been finished: Transfer the information from cache to database.
jedie pushed a commit that referenced this issue Jan 7, 2022
Remove the `TaskProgressModel` model that was used to store process information. But this has some
disadvantages:

* Every `process_info.update()` call creates one `TaskProgressModel` instance and this model was
never cleaned. So if many items was processed, the table may be get full. So this PR fixed #46
* If many small `process_info.update()` calls happens, then we get a high database load

Refactor that completely by using the Django cache to store progress information.
If a task has been finished: Transfer the information from cache to database.
jedie pushed a commit that referenced this issue Jan 7, 2022
Remove the `TaskProgressModel` model that was used to store process information. But this has some
disadvantages:

* Every `process_info.update()` call creates one `TaskProgressModel` instance and this model was
never cleaned. So if many items was processed, the table may be get full. So this PR fixed #46
* If many small `process_info.update()` calls happens, then we get a high database load

Refactor that completely by using the Django cache to store progress information.
If a task has been finished: Transfer the information from cache to database.
Skrattoune added a commit to Skrattoune/django-huey-monitor that referenced this issue Jan 27, 2022
@Skrattoune
Copy link
Contributor

Hi Jedie,
We need to reopen this.
The fix for this issue is not yet been released
I have submitted a PR #67 while we are waiting for PR #58 to be ready and availlable

@jedie
Copy link
Collaborator

jedie commented Jan 28, 2022

#58 is on my TODO, but it will take some time to finish this PR. Hopefully next week...

I will merge your idea from #67

jedie pushed a commit that referenced this issue Jan 28, 2022
Don't create a new TaskProgressModel instances for every `ProcessInfo.update()` call and increment a
existing TaskProgressModel instances. So we will not flood the database ;)

Based on #67
jedie pushed a commit that referenced this issue Jan 28, 2022
Don't create a new TaskProgressModel instances for every `ProcessInfo.update()` call and increment a
existing TaskProgressModel instances. So we will not flood the database ;)

Based on #67
phihag added a commit that referenced this issue Jan 28, 2022
Fix #46 By increment existing TaskProgressModel instances
@jedie
Copy link
Collaborator

jedie commented Jan 28, 2022

jedie pushed a commit that referenced this issue Feb 3, 2022
Remove the `TaskProgressModel` model that was used to store process information. But this has some
disadvantages:

* Every `process_info.update()` call creates one `TaskProgressModel` instance and this model was
never cleaned. So if many items was processed, the table may be get full. So this PR fixed #46
* If many small `process_info.update()` calls happens, then we get a high database load

Refactor that completely by using the Django cache to store progress information.
If a task has been finished: Transfer the information from cache to database.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants