Skip to content

Conversation

@Narsil
Copy link
Contributor

@Narsil Narsil commented Oct 22, 2021

What does this PR do?

Fixes #14033

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

- :obj:`None` : default strategy where nothing in particular happens
- :obj:`"hole"`: Truncates left of input, and leaves a gap wide enough to let generation happen (might
truncate a lot of the prompt and not suitable when generation exceed the model capacity)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
truncate a lot of the prompt and not suitable when generation exceed the model capacity)
truncate a lot of the prompt and not suitable when generation exceeds the model capacity)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of the word "hole" - can we maybe call it "truncate_left" instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten Thanks for that remark, finding good names is important.

truncate_left is a bit misleading though because all methods will end up truncating left (don't think there's a way to avoid it really). The 3 options I have in mind:

  • hole: (Truncate left a chunk equivalent to desired new tokens, run generation as usual). you might delete quite a bit of context for your generation.
  • correct_and_slow: (Don't truncate left at the start, but disable past_values and start generating 1 token at a time without using past and truncating left one by one as you generate new tokens)
  • incorrect_and_fast: Same as correct_and_slow but keep using past_key_values to keep same performance. It's correct on models without position embeddings (not that many I believe), and will drift on other models (if it's only a few tokens difference might be negligible.

Do you think there are other ways to handle long generation ?

I agree hole might be not very clear or adapted.

  • truncate_chunk_left ?
  • truncate_block ?

correct_and_slow could be... strict maybe (it conveys IMO the correctness of it and the fact that it's limiting in some form) of slow (if the other is fast)
incorrect_and_fast could be fast, approximate.

Other names could be more descriptive:

rotate_left_no_past
rotate_left_keep_past
maybe ?

BTW, the fast/slow option I don't intend to add soon, because they require more work within generate and don't seem as necessary for the moment, they just seem other strategies that could be valid depending on context.

keep_length = self.tokenizer.model_max_length - new_tokens
if keep_length <= 0:
raise ValueError(
"We cannot use `hole` to handle this generation the number of desired tokens exceeds the models max length"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a dot missing? I can't make sense of this error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to generate 100 tokens and the model can only handle 50, then it's impossible to generate 100 with this method

Comment on lines +126 to +143
if model.config.__class__.__name__ == "RobertaConfig":
tokenizer.model_max_length = model.config.max_position_embeddings - 2
elif (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what you discusses with @patrickvonplaten or is it separate?
Not sure how it is linked to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Longformer and BART?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess longformer falls into the category of max_model_length > 1000 which does not have the test (since it would require creating a string so long to get out of the model bounds it's a bit too much for tests IMO).

Bart didn't seem to exhibit the same flaw. I checked the source code, it seems Bart hardcoded the +2 within it's embeddings directly allowing for the necessary capacity no matter the configs. https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/modeling_bart.py#L114

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger yes it was. Actually issue lied in upstream model used for testing which incorrectly defined its tokenizer.model_max_length (since modified): https://huggingface.co/sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english

Here we still need this update because for pipelines tests:

  • We take the tokenizer from ModelTester checkpoint
  • We train a smaller tokenizer from it
  • We create a random model from ModelTester.get_config (or ModelTester.get_pipeline_config)

That means the actual values between both can be out of sync (the case here) so we still need that override here.

else:
new_tokens = generate_kwargs.get("max_length", self.model.config.max_length) - cur_len
if new_tokens < 0:
raise ValueError("We cannot infer how many new tokens are expected")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change the error message to something like "input_string corresponding to {cur_len} tokens exceeds maximum generation length of model {max_len}}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also why can't we cut here? We can cut the input tokens the same way it's cut when max_new_tokens is defined no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If max_length > model_max_length a lot of warnings/errors are triggered.

So this would actually require users to use a different max_length than what they would send to generate (since generate doesn't allow max_length > model_max_length in TF at least).

Semantically, I think this becomes very confusing, even in tests, it would require to do clever calculations, involving knowing somehow the model capacity and the current text length in tokens (which you don't have beforehand). It also implies modifying a given argument (max_length) before sending it downstream to generate which IMHO is very bad design/ very confusing.

Using max_new_tokens here is the only way that's simple and declared intention unambiguously IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to clarify

# model_max_len == 512
output = pipe("Some very long... text", max_length=750)  # long text has 800 ids.

Then what are we supposed to do ? There's no way to know how many tokens the user wanted originally. (he wanted -50 apparently)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the error to clarify.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice addition, thank you for working on it @Narsil.

@Narsil Narsil force-pushed the large_text_generation branch from 529f3c5 to 08df6ae Compare October 29, 2021 09:34
@Narsil Narsil merged commit dc540dd into huggingface:master Oct 29, 2021
@Narsil Narsil deleted the large_text_generation branch October 29, 2021 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Text Generation Pipeline doesn't take Truncation = True

4 participants