EmbeddingBag to support mini-batches with offsets

### 🚀 The feature, motivation and pitch

Currently, the `forward` method of `EmbeddingBag`, when offsets are passed, supports only 1D inputs. Hence, training / inference on mini-batches of data isn't supported with offsets.

Offsets are very useful when training on tabular datasets with "multi-valued" cells, such as movie genres, since we may want to sum / average the embeddings associated with several genres to a single vector.  There can also be weighted multi-valued cells, for example, when the multiple values are generated by an auxiliary model, and the weights represent the confidence of the model in its prediction. For example, consider automatic extraction of movie genres from their title and description.



### Alternatives

Two possible alternatives:

1. Using a regular `torch.nn.Embedding` class, extract the embedding vectors, multiply by weights manually, and aggregate them. In this case we lose the efficiency of the EmbeddingBag class, which doesn't have to actually create the full embedding tensor. This idea is relevant only if the number of features in each mini-batch item is the same.
2. Use an EmbeddingBag in our model, decompose the mini-batch to its constituent items, and compute the output of the model for each item using a for-loop.

### Additional context

_No response_

cc @cpuhrsch @jbschlosser @bhosmer @drisspg @mikaylagawarecki

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EmbeddingBag to support mini-batches with offsets #93843

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EmbeddingBag to support mini-batches with offsets #93843

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions