Skip to content

Comments

Reduce runtime memory use#699

Merged
SamCarlberg merged 16 commits intoWPIRoboticsProjects:masterfrom
SamCarlberg:reduce-memory-footprint
Dec 2, 2016
Merged

Reduce runtime memory use#699
SamCarlberg merged 16 commits intoWPIRoboticsProjects:masterfrom
SamCarlberg:reduce-memory-footprint

Conversation

@SamCarlberg
Copy link
Member

@SamCarlberg SamCarlberg commented Nov 2, 2016

Reduces memory use by watershed (release temp objects and reuse memory)

I can run in headless mode with -Xmx15m and UI mode with -Xmx150m (was previously seeing up to 890MB heap in UI with watershed)

Reduces memory use by watershed (release temp objects and reuse memory)
Schedule GC to run every 15 seconds

I can run in headless mode with -Xmx15m and UI mode with -Xmx150m (was previously seeing up to 890MB heap in UI with watershed)
@codecov-io
Copy link

codecov-io commented Nov 2, 2016

Current coverage is 53.54% (diff: 20.00%)

Merging #699 into master will decrease coverage by 0.13%

@@             master       #699   diff @@
==========================================
  Files           225        226     +1   
  Lines          7329       7364    +35   
  Methods           0          0          
  Messages          0          0          
  Branches        721        724     +3   
==========================================
+ Hits           3934       3943     +9   
- Misses         3221       3247    +26   
  Partials        174        174          

Sunburst

Powered by Codecov. Last update 0af249a...25abdd5

Sets -Xmx200m in native app
Also fixes watershed output
@SamCarlberg SamCarlberg force-pushed the reduce-memory-footprint branch from dc07b19 to 79d5c86 Compare November 2, 2016 21:00
@SamCarlberg SamCarlberg changed the title Reduce runtime memory use by ~70% Reduce runtime memory use Nov 3, 2016
@SamCarlberg
Copy link
Member Author

So just reducing the max heap size will cause more garbage collections on the young generation, but doesn't seem to run full GC's any more often.

The biggest problem with memory is that if the GC leaves dead native objects (Mats, Rects, etc.) floating around, the native memory won't get freed up and causes the process's memory use to explode -- 290MB of used heap space can correspond to 10GB of memory used by the process (this could go even higher, but my 16GB of RAM was used up and swapping was getting nasty). This was just running the core jar.

Calling System.gc() after every pipeline run hugely alleviates this problem. With the same pipeline and settings as above, I'm seeing heap use between 5MB and 7MB and a total process use of ~200MB. The downside to this is that a full GC runs about 20 times per second.

Calling System.gc periodically (I tested with 15 second intervals) helped, but heap use was peaking around 50MB and the process was using ~1.6GB of memory. Not great.

Calling System.gc() every 10 pipeline runs, however, seems to be a good solution. Heap use is between 6MB and 8MB, process seems to be steady at about 230-240MB, and a full GC about 2-3 times per second.


Heap and CPU profiles

-Xmx30m, no GC calls

screenshot from 2016-11-04 12-19-03

-Xmx2G, no GC calls

screenshot from 2016-11-04 12-19-07

-Xmx2G, System.gc() after every pipeline run

screenshot from 2016-11-04 12-19-10

-Xmx2G, System.gc() every 15 seconds

screenshot from 2016-11-04 12-19-12

-Xmx2G, System.gc() after every 10 pipeline runs

screenshot from 2016-11-04 12-19-15

Currently set to run after every 15 pipeline runs
@JLLeitschuh
Copy link
Member

I'd like to get some other people's thoughts on this before we merge it.

@saudet Is calling release on these objects okay or should we be calling something else?

@SamCarlberg
Copy link
Member Author

deallocate() only seems to work if the Deallocator object hasn't been set (which it seems to be) or on garbage collection. But it looks like calling release() on a mat will immediately free up the native memory if there's no longer a reference to it

@saudet
Copy link

saudet commented Nov 5, 2016 via email

@AustinShalit AustinShalit modified the milestone: v1.5.0 Nov 7, 2016
@SamCarlberg
Copy link
Member Author

So it looks like the only way to address this is to periodically call System.gc(), or to make Eden space small to force more GC runs to clean up unused mat objects and deallocate their native memory.

The first one has the advantage of not putting any limits on the heap size, but it will trigger a major stop-the-world GC, which can take a while. It also keeps memory use very low in both UI and headless modes.

The second option will only trigger minor GCs in Eden and S1/S2 that don't take much time, but that will impose limits on other parts of the program. For example, previewing large images in the UI takes a ton of memory on the heap -- a 12MP picture can take upwards of 150MB to preview -- and won't be able to fit in new gen, making the app hang while the JVM keeps trying to unsuccessfully allocate enough memory for it. #711 addresses this problem, but there may be other parts of the program or future features that would be limited by a tiny new gen space.

I'm not happy with either of these options, but I don't see any other alternatives.

@saudet
Copy link

saudet commented Nov 11, 2016

For cases like these, it's useful to reuse (pre)allocated memory as much as possible, to avoid constantly allocating and deallocating. I am not familiar with the details of this project, but that doesn't look possible?

@SamCarlberg
Copy link
Member Author

I don't think it's possible for us to use preallocated memory. We also can't make eden small because that puts limits on the size of the images we can preview in the UI (big images = big buffers that need to be allocated). So the only solution I see is to periodically call System.gc() to clean up the old Mats leftover from runs of the pipeline.

@SamCarlberg
Copy link
Member Author

@JLLeitschuh I'd like you to try this out and see if you can find any serious problems

@saudet
Copy link

saudet commented Nov 29, 2016

@SamCarlberg BTW, recent versions of JavaCPP check before each allocation the amount of physical memory used and call System.gc() only when that amount is crossed, so it's possible to limit these calls to a minimum that way. We can adjust that amount by setting the org.bytedeco.javacpp.maxphysicalbytes system property to the desired value:
http://bytedeco.org/javacpp/apidocs/org/bytedeco/javacpp/Pointer.html#maxPhysicalBytes

@SamCarlberg
Copy link
Member Author

That's good to know. I'll try it out, see how it compares to running System.gc() every few passes.

CvMat mat = cvMat(1, b.length, CV_8UC1, new BytePointer(b));
if (decoded != null) {
cvReleaseImage(decoded);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

private final InputSocket<ContoursReport> contoursSocket;
private final OutputSocket<ContoursReport> outputSocket;

private static final int maxMarkers = 253;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be MAX_MARKERS?

private final OutputSocket<ContoursReport> outputSocket;

private static final int maxMarkers = 253;
private final List<Mat> markerPool = new ArrayList<>(maxMarkers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be an immutable list?

runsSinceLastGc = 0;
lastRun = System.nanoTime();
System.gc();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use StopWatch?

for (int i = 0; i < MAX_MARKERS; i++) {
tmpPool.add(new Mat());
}
markerPool = ImmutableList.copyOf(tmpPool);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the ImmutableListBuilder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's no better than what's currently happening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use Guava 21.0:
markerPool = Stream.generate(Mat::new).limit(MAX_MARKERS).collect(ImmutableList.toImmutableList());

/**
* The maximum number of runs allowed before calling System.gc().
*/
private static final int MAX_RUNS_BEFORE_GC = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called minimum instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should change that

for (int i = 0; i < MAX_MARKERS; i++) {
tmpPool.add(new Mat());
}
markerPool = ImmutableList.copyOf(tmpPool);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use Guava 21.0:
markerPool = Stream.generate(Mat::new).limit(MAX_MARKERS).collect(ImmutableList.toImmutableList());

@SamCarlberg SamCarlberg merged commit 92c589c into WPIRoboticsProjects:master Dec 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants