-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Improve / Fix Weight Sharing #1211
Copy link
Copy link
Open
Labels
Description
Weight sharing as-is relies on a weight owner with which shared layers share their parameter blobs. This poses a few problems in relation to loss, loading and saving parameters, and weight initialization that are listed here for addressing.
- Fix incorrect momentum and history due to separation of shared weights Fix weight sharing #2866
- Fix the resuming / fine-tuning issue for shared weights; see Contrastive loss layer for training siamese nets #959 (comment). Done in On-the-fly net resizing, without reallocation (where possible) #594 as it turns out.
- Determine if there is actually a loss / weight ownership issue as asked at https://github.com/BVLC/caffe/pull/546/files#r16817721 by @ashafaei. [No, there is not –shelhamer]
- Save memory through accumulation Decouple the computational batch size and minibatch size by accumulating gradients #1977 by sharing diffs Fix weight sharing #2866
- Load and save only the owned weights and not shared duplicates Snapshot model weights/solver state to HDF5 files #2836 for hdf5
- Figure out how snapshot / restore should resolve by layer or param name and fallback as needed
- Only the owner should initialize weights. Currently unnecessary work and memory is expended filling all weights, and then these are discarded to share with the weight owners.
- Die if weight fillers are defined in layers that don't own their parameters (the weights are properly initialized in this case, but only by ignoring the incorrect specification as written).
Reactions are currently unavailable