Skip to content

SequenceFeatureExtractor.pad wasting time converting numpy array to list of numpy arrays #46328

@bolshoytoster

Description

@bolshoytoster

System Info

This is environment agnositic. (also transformers env throws an error)

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Try to run charsui with a large audio file (I tried with a 25 minute one (I know I'm supposed to use smaller inputs)).
  2. Notice it takes several minutes in SequenceFeatureExtractor.pad converting a numpy array into a list of numpy arrays.

You could also just run SequenceFeatureExtractor.pad with a very large numpy array as input, I haven't wasted time making a MRE.

Expected behavior

Transformers shouldn't waste time with this, it's pointless. This could be prevented by just not doing any work if the input is already a numpy array, by changing

for key, value in processed_features.items():
if isinstance(value[0], (int, float)):
processed_features[key] = to_numpy(value)
else:
processed_features[key] = [to_numpy(v) for v in value]

to

        for key, value in processed_features.items():
            if isinstance(value[0], (int, float)):
                processed_features[key] = to_numpy(value)
            elif not isinstance(value, np.ndarray):
                processed_features[key] = [to_numpy(v) for v in value]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions