Commit 9c6979a
[Gradient Compression] Error feedback for PowerSGD (still need to fix the key in error_dict) (#48670)
Summary:
Pull Request resolved: #48670
Support an optional error feedback for PowerSGD -- storing the difference (i.e., the local error caused by compression) between the input gradient (adjusted by the existing error) and the gradient after decompression, and reinserting it at the next iteration.
Still need to add an index field to GradBucket as the key of error_dict. This is because the current key, input tensor of the bucket, can change across steps, as the buckets may be rebuilt in forward pass in order to save peak memory usage.
This is halfway of error feedback. Plan to add the new index field in a separate PR.
Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 117636492
Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl
Reviewed By: rohan-varma
Differential Revision: D25240290
fbshipit-source-id: 5b6e11e711caccfb8984ac2767dd107dbf4c9b3b1 parent 463e5d2 commit 9c6979a
File tree
2 files changed
+51
-4
lines changed- torch/distributed/algorithms/ddp_comm_hooks
2 files changed
+51
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
20 | 25 | | |
21 | 26 | | |
22 | 27 | | |
| |||
25 | 30 | | |
26 | 31 | | |
27 | 32 | | |
| 33 | + | |
28 | 34 | | |
29 | 35 | | |
30 | 36 | | |
| |||
Lines changed: 44 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
34 | | - | |
35 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
36 | 48 | | |
37 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
38 | 59 | | |
39 | 60 | | |
40 | 61 | | |
41 | 62 | | |
42 | 63 | | |
43 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
44 | 71 | | |
45 | 72 | | |
46 | 73 | | |
| |||
98 | 125 | | |
99 | 126 | | |
100 | 127 | | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
101 | 139 | | |
102 | 140 | | |
103 | 141 | | |
| |||
141 | 179 | | |
142 | 180 | | |
143 | 181 | | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
144 | 185 | | |
145 | 186 | | |
146 | 187 | | |
| |||
0 commit comments