-
Notifications
You must be signed in to change notification settings - Fork 2.5k
High memory consumption during long running jobs #3790
Copy link
Copy link
Closed
Labels
for: backport-to-4.3.xIssues that will be back-ported to the 4.3.x lineIssues that will be back-ported to the 4.3.x linehas: minimal-exampleBug reports that provide a minimal complete reproducible exampleBug reports that provide a minimal complete reproducible examplehas: votesIssues that have votesIssues that have votesin: corerelated-to: performancetype: enhancement
Milestone
Metadata
Metadata
Assignees
Labels
for: backport-to-4.3.xIssues that will be back-ported to the 4.3.x lineIssues that will be back-ported to the 4.3.x linehas: minimal-exampleBug reports that provide a minimal complete reproducible exampleBug reports that provide a minimal complete reproducible examplehas: votesIssues that have votesIssues that have votesin: corerelated-to: performancetype: enhancement
Type
Fields
Give feedbackNo fields configured for issues without a type.
Bug description
We found that memory consumption is fairly high on one of the service nodes that uses the Spring Batch. Even though both data nodes did a similar amount of work, the memory consumption across nodes was not even - 15GB vs 1.5GB (see memory use screenshot).
We have some jobs that could run for seconds while others might run for hours, so we set the polling interval (MessageChannelPartitionHandler#setPollInterval) to 1 second rather than 10 seconds that is the default. In a large running job scenario, we ended up creating 837 step executions.
What I found was that MessageChannelPartitionHandler#pollReplies gets a full StepExecution representation for each step, which contains a JobExecution which would also contain StepExecutions for each. However, they are retrieved at different times and stages. This means that we end up with square number of StepExecution objects, e.g. 837*837=700569 StepExecutions (see screenshot below)
Environment
Initially reproduced on Spring Batch 4.1.4.
Expected behavior
My proposal would be to:
Memory usage graph comparison between two service nodes, doing roughly equal amount of work:
My apologies for a messy screenshot, but it does explain the number of StepExecution objects: