High memory usage with bucketing

When using the bucketing module I'd expect the memory usage to be about the same as when using the normal module unrolled to the largest bucket size. However we observe unusually high GPU memory usage in MxNet when using multiple buckets. 
This can be reproduced/observed with the lstm_bucketing.py example from the latest MXNet commit as such:
in examples/rnn/lstm_bucketing.py change:

```
num-layers to 4
num-hidden to 1024
num-embed to 512
```

When using multiple buckets (see line 49), overall memory usage is 1419MB.
When changing line 49 to only use a single bucket (e.g. 60), overall memory usage is only 1185MB.

It should be noted that the initial memory usage for bucketing is the same (1185MB), but after a couple of batches the memory usage increases. We suspect this is due to the BucketingModule binding another sub module when a new bucket size is given by the data iterator and memory sharing across modules isn't working properly.

While for this model the difference is only 300 MB, we observed much higher differences in practice, making it difficult to train any reasonably sized model with bucketing.

Note: the default bucket key is of course the largest bucket.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High memory usage with bucketing #5035

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

High memory usage with bucketing #5035

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions