Share convolution buffers to reduce memory usage #1291

shelhamer · 2014-10-16T00:32:52Z

Share the columnation buffer for im2col / col2im transformations across all Caffe convolution layers. The memory usage is now equal to the maximum buffer size instead of the sum over all layers. In particular this is useful for many-layered architectures like the VGG ILSVRC14 19 layer model.

Advice and Cautions:

This is worth it for fully-convolutional models where Caffe convolution is faster than cuDNN.
No parallelism. Only a single net can do forward / backward at a time. As the buffer is shared by all layers within and across nets, no convolution can be done in parallel. (A fix for parallel nets is to make the buffer a member of net. DAG parallelism is still out in that case, but isn't currently parallelized anyway.)
This has no effect on cuDNN convolution, but that consumes less memory anyway.

All credit to @longjon who reshaped our world in #594 and suggested this patch in #520 (comment).

Do not merge.

sguada · 2014-10-16T01:55:43Z

@longjon sweet!, very nice and small change compared to #520

shelhamer · 2014-10-16T01:58:31Z

@sguada this simplification to sharing could still be paired with a buffer owned by Net to solve the parallel execution issue. It will be much simpler patch thanks to @longjon

longjon · 2014-10-16T02:37:13Z

Re: moving the shared buffer to Net, do note that this is a tradeoff; it allows net concurrency at the cost of memory when nets are used sequentially. For example, this patch currently shares column buffer memory between all train and test nets during solving. (Also note that this will still break layer concurrency, when added in the future, either way.)

shelhamer · 2014-10-20T16:55:15Z

This might cause the following post-optimization crash

I1020 06:44:01.849261  3396 caffe.cpp:121] Optimization Done.
F1020 06:44:02.442028  3396 syncedmem.cpp:16] Check failed: error == cudaSuccess (29 vs. 0)  driver shutting down
*** Check failure stack trace: ***
    @     0x7fc406272daa  (unknown)
    @     0x7fc406272ce4  (unknown)
    @     0x7fc4062726e6  (unknown)
    @     0x7fc406275687  (unknown)
    @           0x499d99  caffe::SyncedMemory::~SyncedMemory()
    @           0x4aa862  boost::detail::sp_counted_impl_p<>::dispose()
    @           0x4b3499  caffe::Blob<>::~Blob()
    @     0x7fc401b5f149  (unknown)
    @     0x7fc401b5f195  (unknown)
    @     0x7fc401b44ecc  (unknown)
    @           0x4167a7  (unknown)
    @              (nil)  (unknown)

but I don't have time to investigate at the moment so I'm just noting it here.

longjon · 2014-10-20T22:53:42Z

Yes, I've noticed that as well. I don't immediately see anything in our code that would cause that, so my guess is that cuda has some static variables that are being destroyed before the shared buffer.

futurely · 2014-10-21T16:24:57Z

There is almost no document about the behavior of static variable in CUDA. Another solution is to create a singleton class containing the shared buffer.

// http://stackoverflow.com/questions/1008019/c-singleton-design-pattern
class SharedBuffer {
    public:
        static SharedBuffer& instance() {
            static SharedBuffer  instance; // Guaranteed to be destroyed.
                                  // Instantiated on first use.
            return instance;
        }

        shared_ptr<Blob<Dtype> > buffer() { return buffer_; }
    private:
        SharedBuffer() {};     // Constructor? (the {} brackets) are needed here.
        // Dont forget to declare these two. You want to make sure they
        // are unaccessable otherwise you may accidently get copies of
        // your singleton appearing.
        DISABLE_COPY_AND_ASSIGN(SharedBuffer);
        shared_ptr<Blob<Dtype> > buffer_;
};

longjon · 2014-10-22T03:41:37Z

@futurely I don't think that solves the problem; when would the static SharedBuffer instance be destroyed (thus destroying buffer_)?

One solution is to explictly destroy the shared buffer with a clean-up function. But, that's somewhat irritating, and requires an extra line to be added to all Caffe invocations.

futurely · 2014-10-23T17:49:48Z

The static instance itself should also be a shared_ptr.

// http://stackoverflow.com/questions/1008019/c-singleton-design-pattern
class SharedBuffer {
    public:
        static shared_ptr<SharedBuffer> instance() {
            // Guaranteed to be destroyed. Instantiated on first use.
            static shared_ptr<SharedBuffer> instance(new SharedBuffer());                                   
            return instance;
        }
       ...
};

longjon · 2014-10-24T07:10:18Z

@futurely, I don't see how using a shared_ptr is any different... the lifetime of the SharedBuffer object is the same as the lifetime of the shared_ptr, right?

However, I think I see the point of the original code now, which might actually address the issue... if the static variable is method-local, it doesn't get allocated until the method is called, and therefore gets destroyed before any global static variables. (If I understand correctly, then, there's no need for a singleton class, just for an accessor function.)

I'm not going to bother updating an unmergeable PR to fix a post-optimization crash, but we can keep that trick in mind for the future.

futurely · 2014-10-24T07:30:20Z

There are still some chances to save this PR. It's all about the lifetime of static variable.

longjon · 2014-10-24T08:04:43Z

Yeah... that's what we're discussing here.

To be clear, the shared column buffer is still planned for merge, it just needs to be an option to avoid breaking concurrency. Until that's done there's no point in fixing the static destruction order issue, but let's keep it in mind for when it's ready (if it's still relevant then).

futurely · 2014-12-05T06:03:08Z

ArrayFire simply ignored this error.

* A sample code was added. * `slice_dim` and `slice_point` attributes were explained.

[docs] brief explanation of SLICE layer's attributes

See https://github.com/BVLC/caffe/blob/master/models/bvlc_reference_caffenet/solver.prototxt

Correct 'epochs' to 'iterations'

Next: release candidater

…current project

fix Imagenet example path

@mees

set the right rpath for tools and examples respectively thanks for the report @mees!

[build] fix dynamic linking of tools

… was overwritten with symlink created at build time and installed with install(DIRECTORY ...)

… systems). This commit specifies Python2 with which cpp_lint.py works :-)

[cmake] fix install rpath for pycaffe

APPLE was misspelled in Line 27

fixes: cpp_lint.py fails silently with Python3

Check caffe tool runs

Making python3 work with cmake and the new python wrapper

…est phase)

Commands, such as $(error ...), are not allowed to be indented with tabs outside of targets, throwing an error instead of outputting the actual error. The solution is to use innocuous spaces instead. Ideally, spaces should be used everywhere outside targets, but since make does not mind it if variable assignments are tab-indented outside targets, a complete overhaul is not necessary. However, if more errors are added, it might make more sense to be consistent. Also, make will already add a period so I removed it.

fix accelerate / veclib path for OS X 10.10

Replaced illegal tab in Makefile with spaces.

The sample was missing some additional spaces to be correctly rendered on the HTML. The mistake was mine.

Decoding the datum before feeding it into the reshaping data layer

Small fix (visualization) on SLICE layer's documentation

@longjon

share the im2col / col2im buffers among convolution + deconvolution layers by making the buffer a static member. @longjon deserves all the credit for the reshaping BVLC#594 and this patch.

shelhamer · 2015-03-03T02:27:01Z

Replaced by #2016.

sguada mentioned this pull request Oct 16, 2014

Shared col buffer between Convolution Layers #520

Closed

sergeyk force-pushed the dev branch from 2fb4c97 to 1718903 Compare October 17, 2014 18:44

shelhamer mentioned this pull request Oct 17, 2014

Modular model definitions: network layer, layer modules, and protobuf generation #1169

Closed

shelhamer added the sandbox label Dec 30, 2014

shelhamer and others added 14 commits January 24, 2015 18:27

clarify draw_net.py usage: net prototxt, not caffemodel

2f869e7

[docs] ask install + hardware questions on caffe-users

61c63f6

[docs] send API link to class list

4cc8195

[docs] add check mode hint to CPU-only mode error

1f7c3de

Brief explanation of SLICE layer's attributes

8b96472

* A sample code was added. * `slice_dim` and `slice_point` attributes were explained.

lint 1f7c3de

75d0e16

Merge pull request BVLC#1817 from boechat107/patch-1

e3c895b

[docs] brief explanation of SLICE layer's attributes

Correct 'epochs' to 'iterations'

1e0d49a

See https://github.com/BVLC/caffe/blob/master/models/bvlc_reference_caffenet/solver.prototxt

Merge pull request BVLC#1879 from bamos/patch-1

3e9b050

Correct 'epochs' to 'iterations'

Merge pull request BVLC#1849 from BVLC/next

f998127

Next: release candidater

Updated the path for get_ilsvrc_aux.sh to match what is found in the …

af01b9c

…current project

Merge pull request BVLC#1914 from eerwitt/master

5ee85b7

fix Imagenet example path

[build] fix dynamic linking of tools

eabbccd

set the right rpath for tools and examples respectively thanks for the report @mees!

Merge pull request BVLC#1921 from shelhamer/fix-tool-linking

682d9da

[build] fix dynamic linking of tools

Anatoly Baksheev and others added 21 commits February 22, 2015 18:58

ignore pycharm files

a1e951d

set proper CMAKE_INSTALL_RPATH for _caffe.so and tools

fca05c3

fixed bug in install-tree: _caffe.so installed by install(TARGET ...)…

645aa03

… was overwritten with symlink created at build time and installed with install(DIRECTORY ...)

minor cmake sumamry log fix

5e06d16

cpp_lint.py fails silently with Python3 (which is the default on some…

569ae01

… systems). This commit specifies Python2 with which cpp_lint.py works :-)

Merge pull request BVLC#1939 from Nerei/bugfix/install_rpath_for_pycaffe

cb1f4d6

[cmake] fix install rpath for pycaffe

APPLE was misspelled. in Line 27

845f9ea

Merge pull request BVLC#1948 from spmallick/patch-1

486360d

APPLE was misspelled in Line 27

Merge pull request BVLC#1941 from jsupancic/cpp_lint_python2

c091197

fixes: cpp_lint.py fails silently with Python3

Merge pull request BVLC#1926 from shelhamer/test-caffe-tool

360dbfd

Check caffe tool runs

Making python3 work with cmake and the new python wrapper

54037d3

Merge pull request BVLC#1923 from philkr/python3_master

b915f9d

Making python3 work with cmake and the new python wrapper

Decoding the datum before feeding it into the reshaping data layer

2cf5089

fixed matcaffe printout to specify num of args (now including train/t…

4a3887a

…est phase)

Makefile fix for OS X 10.10

1377e1b

Merge pull request BVLC#1961 from sergeyk/master

3a1195a

fix accelerate / veclib path for OS X 10.10

Merge pull request BVLC#1960 from gustavla/makefile_fix

b9aa166

Replaced illegal tab in Makefile with spaces.

Small fix (visualization) on SLICE layer's documentation

25cdd35

The sample was missing some additional spaces to be correctly rendered on the HTML. The mistake was mine.

Merge pull request BVLC#1955 from philkr/reshaping_encoded

a677076

Decoding the datum before feeding it into the reshaping data layer

Merge pull request BVLC#1999 from boechat107/patch-2

4fba3da

Small fix (visualization) on SLICE layer's documentation

shelhamer mentioned this pull request Mar 1, 2015

Saving memory by reusing col_buffer_. #2009

Closed

shelhamer force-pushed the share-col-buffer branch 2 times, most recently from 9aea056 to 0d052dd Compare March 3, 2015 02:10

share columnation buffers for convolution to save memory

1dcfc3c

share the im2col / col2im buffers among convolution + deconvolution layers by making the buffer a static member. @longjon deserves all the credit for the reshaping BVLC#594 and this patch.

shelhamer force-pushed the share-col-buffer branch from 0d052dd to 1dcfc3c Compare March 3, 2015 02:25

shelhamer mentioned this pull request Mar 3, 2015

Share convolution buffers to reduce memory usage #2016

Open

shelhamer closed this Mar 3, 2015

naibaf7 mentioned this pull request Jun 26, 2015

OpenCL Backend #2195

Closed

Share convolution buffers to reduce memory usage #1291

Share convolution buffers to reduce memory usage #1291

Uh oh!

Conversation

shelhamer commented Oct 16, 2014

Uh oh!

sguada commented Oct 16, 2014

Uh oh!

shelhamer commented Oct 16, 2014

Uh oh!

longjon commented Oct 16, 2014

Uh oh!

shelhamer commented Oct 20, 2014

Uh oh!

longjon commented Oct 20, 2014

Uh oh!

futurely commented Oct 21, 2014

Uh oh!

longjon commented Oct 22, 2014

Uh oh!

futurely commented Oct 23, 2014

Uh oh!

longjon commented Oct 24, 2014

Uh oh!

futurely commented Oct 24, 2014

Uh oh!

longjon commented Oct 24, 2014

Uh oh!

futurely commented Dec 5, 2014

Uh oh!

shelhamer commented Mar 3, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants