-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Share convolution buffers to reduce memory usage #1291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Re: moving the shared buffer to |
|
This might cause the following post-optimization crash but I don't have time to investigate at the moment so I'm just noting it here. |
|
Yes, I've noticed that as well. I don't immediately see anything in our code that would cause that, so my guess is that cuda has some static variables that are being destroyed before the shared buffer. |
|
There is almost no document about the behavior of static variable in CUDA. Another solution is to create a singleton class containing the shared buffer. |
|
@futurely I don't think that solves the problem; when would the static One solution is to explictly destroy the shared buffer with a clean-up function. But, that's somewhat irritating, and requires an extra line to be added to all Caffe invocations. |
|
The static instance itself should also be a shared_ptr. |
|
@futurely, I don't see how using a However, I think I see the point of the original code now, which might actually address the issue... if the static variable is method-local, it doesn't get allocated until the method is called, and therefore gets destroyed before any global static variables. (If I understand correctly, then, there's no need for a singleton class, just for an accessor function.) I'm not going to bother updating an unmergeable PR to fix a post-optimization crash, but we can keep that trick in mind for the future. |
|
There are still some chances to save this PR. It's all about the lifetime of static variable. |
|
Yeah... that's what we're discussing here. To be clear, the shared column buffer is still planned for merge, it just needs to be an option to avoid breaking concurrency. Until that's done there's no point in fixing the static destruction order issue, but let's keep it in mind for when it's ready (if it's still relevant then). |
|
ArrayFire simply ignored this error. |
* A sample code was added. * `slice_dim` and `slice_point` attributes were explained.
[docs] brief explanation of SLICE layer's attributes
Correct 'epochs' to 'iterations'
Next: release candidater
fix Imagenet example path
set the right rpath for tools and examples respectively thanks for the report @mees!
[build] fix dynamic linking of tools
… was overwritten with symlink created at build time and installed with install(DIRECTORY ...)
… systems). This commit specifies Python2 with which cpp_lint.py works :-)
[cmake] fix install rpath for pycaffe
APPLE was misspelled in Line 27
fixes: cpp_lint.py fails silently with Python3
Check caffe tool runs
Making python3 work with cmake and the new python wrapper
Commands, such as $(error ...), are not allowed to be indented with tabs outside of targets, throwing an error instead of outputting the actual error. The solution is to use innocuous spaces instead. Ideally, spaces should be used everywhere outside targets, but since make does not mind it if variable assignments are tab-indented outside targets, a complete overhaul is not necessary. However, if more errors are added, it might make more sense to be consistent. Also, make will already add a period so I removed it.
fix accelerate / veclib path for OS X 10.10
Replaced illegal tab in Makefile with spaces.
The sample was missing some additional spaces to be correctly rendered on the HTML. The mistake was mine.
Decoding the datum before feeding it into the reshaping data layer
Small fix (visualization) on SLICE layer's documentation
9aea056 to
0d052dd
Compare
0d052dd to
1dcfc3c
Compare
|
Replaced by #2016. |
Share the columnation buffer for im2col / col2im transformations across all Caffe convolution layers. The memory usage is now equal to the maximum buffer size instead of the sum over all layers. In particular this is useful for many-layered architectures like the VGG ILSVRC14 19 layer model.
Advice and Cautions:
All credit to @longjon who reshaped our world in #594 and suggested this patch in #520 (comment).
Do not merge.