Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

High memory usage with bucketing #5035

@tdomhan

Description

@tdomhan

When using the bucketing module I'd expect the memory usage to be about the same as when using the normal module unrolled to the largest bucket size. However we observe unusually high GPU memory usage in MxNet when using multiple buckets.
This can be reproduced/observed with the lstm_bucketing.py example from the latest MXNet commit as such:
in examples/rnn/lstm_bucketing.py change:

num-layers to 4
num-hidden to 1024
num-embed to 512

When using multiple buckets (see line 49), overall memory usage is 1419MB.
When changing line 49 to only use a single bucket (e.g. 60), overall memory usage is only 1185MB.

It should be noted that the initial memory usage for bucketing is the same (1185MB), but after a couple of batches the memory usage increases. We suspect this is due to the BucketingModule binding another sub module when a new bucket size is given by the data iterator and memory sharing across modules isn't working properly.

While for this model the difference is only 300 MB, we observed much higher differences in practice, making it difficult to train any reasonably sized model with bucketing.

Note: the default bucket key is of course the largest bucket.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions