Adding `handle_long_generation` paramters for `text-generation` pipeline. #14118

Narsil · 2021-10-22T11:06:17Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sgugger

Thanks for adding this!

sgugger · 2021-10-22T17:41:23Z

src/transformers/pipelines/text_generation.py

+
+                - :obj:`None` : default strategy where nothing in particular happens
+                - :obj:`"hole"`: Truncates left of input, and leaves a gap wide enough to let generation happen (might
+                  truncate a lot of the prompt and not suitable when generation exceed the model capacity)


Suggested change

truncate a lot of the prompt and not suitable when generation exceed the model capacity)

truncate a lot of the prompt and not suitable when generation exceeds the model capacity)

Not a huge fan of the word "hole" - can we maybe call it "truncate_left" instead?

@patrickvonplaten Thanks for that remark, finding good names is important.

truncate_left is a bit misleading though because all methods will end up truncating left (don't think there's a way to avoid it really). The 3 options I have in mind:

hole: (Truncate left a chunk equivalent to desired new tokens, run generation as usual). you might delete quite a bit of context for your generation.

correct_and_slow: (Don't truncate left at the start, but disable past_values and start generating 1 token at a time without using past and truncating left one by one as you generate new tokens)

incorrect_and_fast: Same as correct_and_slow but keep using past_key_values to keep same performance. It's correct on models without position embeddings (not that many I believe), and will drift on other models (if it's only a few tokens difference might be negligible.

Do you think there are other ways to handle long generation ?

I agree hole might be not very clear or adapted.

truncate_chunk_left ?

truncate_block ?

correct_and_slow could be... strict maybe (it conveys IMO the correctness of it and the fact that it's limiting in some form) of slow (if the other is fast)
incorrect_and_fast could be fast, approximate.

Other names could be more descriptive:

rotate_left_no_past
rotate_left_keep_past
maybe ?

BTW, the fast/slow option I don't intend to add soon, because they require more work within generate and don't seem as necessary for the moment, they just seem other strategies that could be valid depending on context.

src/transformers/pipelines/text_generation.py

sgugger · 2021-10-22T17:42:37Z

src/transformers/pipelines/text_generation.py

+                keep_length = self.tokenizer.model_max_length - new_tokens
+                if keep_length <= 0:
+                    raise ValueError(
+                        "We cannot use `hole` to handle this generation the number of desired tokens exceeds the models max length"


Is there a dot missing? I can't make sense of this error.

If you want to generate 100 tokens and the model can only handle 50, then it's impossible to generate 100 with this method

sgugger · 2021-10-22T17:44:49Z

tests/test_pipelines_common.py

+                        if model.config.__class__.__name__ == "RobertaConfig":
+                            tokenizer.model_max_length = model.config.max_position_embeddings - 2
+                        elif (


Is this what you discusses with @patrickvonplaten or is it separate?
Not sure how it is linked to this PR.

What about Longformer and BART?

I guess longformer falls into the category of max_model_length > 1000 which does not have the test (since it would require creating a string so long to get out of the model bounds it's a bit too much for tests IMO).

Bart didn't seem to exhibit the same flaw. I checked the source code, it seems Bart hardcoded the +2 within it's embeddings directly allowing for the necessary capacity no matter the configs. https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/modeling_bart.py#L114

@sgugger yes it was. Actually issue lied in upstream model used for testing which incorrectly defined its tokenizer.model_max_length (since modified): https://huggingface.co/sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english

Here we still need this update because for pipelines tests:

We take the tokenizer from ModelTester checkpoint

We train a smaller tokenizer from it

We create a random model from ModelTester.get_config (or ModelTester.get_pipeline_config)

That means the actual values between both can be out of sync (the case here) so we still need that override here.

tests/test_tokenization_common.py

patrickvonplaten · 2021-10-24T03:57:21Z

src/transformers/pipelines/text_generation.py

+            else:
+                new_tokens = generate_kwargs.get("max_length", self.model.config.max_length) - cur_len
+                if new_tokens < 0:
+                    raise ValueError("We cannot infer how many new tokens are expected")


Maybe change the error message to something like "input_string corresponding to {cur_len} tokens exceeds maximum generation length of model {max_len}}"

But also why can't we cut here? We can cut the input tokens the same way it's cut when max_new_tokens is defined no?

If max_length > model_max_length a lot of warnings/errors are triggered.

So this would actually require users to use a different max_length than what they would send to generate (since generate doesn't allow max_length > model_max_length in TF at least).

Semantically, I think this becomes very confusing, even in tests, it would require to do clever calculations, involving knowing somehow the model capacity and the current text length in tokens (which you don't have beforehand). It also implies modifying a given argument (max_length) before sending it downstream to generate which IMHO is very bad design/ very confusing.

Using max_new_tokens here is the only way that's simple and declared intention unambiguously IMO.

Maybe to clarify

# model_max_len == 512 output = pipe("Some very long... text", max_length=750) # long text has 800 ids.

Then what are we supposed to do ? There's no way to know how many tokens the user wanted originally. (he wanted -50 apparently)

I updated the error to clarify.

LysandreJik

This is a nice addition, thank you for working on it @Narsil.

tests/test_pipelines_common.py

…ine.

`max_new_tokens` to make it possible to understand user's intent. Otherwise, `max_length` == `tokenizer.model_max_length` < input_ids.shape[0].

Narsil mentioned this pull request Oct 22, 2021

Text Generation Pipeline doesn't take Truncation = True #14033

Closed

Narsil requested review from LysandreJik and sgugger October 22, 2021 16:39

sgugger approved these changes Oct 22, 2021

View reviewed changes

patrickvonplaten reviewed Oct 24, 2021

View reviewed changes

LysandreJik approved these changes Oct 27, 2021

View reviewed changes

tests/test_pipelines_common.py Outdated Show resolved Hide resolved

Narsil added 13 commits October 29, 2021 11:34

Adding handle_long_generation paramters for text-generation pipel…

72fc242

…ine.

More error handling

dba4cdd

Fixing tests by dropping tf support on this functionality, it needs

857af42

`max_new_tokens` to make it possible to understand user's intent. Otherwise, `max_length` == `tokenizer.model_max_length` < input_ids.shape[0].

Fixing doc ?

0c56176

Doc ?

b05fa19

Remove link from doc.

91c2144

Catched an issue on roberta.

5a0bfbb

Damn doc.

b690dd1

Non BC proposal ?

62841ae

Cleaning the fix ?

8a54b8f

Finally using only a test override.

6c5d993

Don't need to modify this.

e8e0073

Bad print.

08df6ae

Narsil force-pushed the large_text_generation branch from 529f3c5 to 08df6ae Compare October 29, 2021 09:34

Narsil merged commit dc540dd into huggingface:master Oct 29, 2021

Narsil deleted the large_text_generation branch October 29, 2021 13:29

	truncate a lot of the prompt and not suitable when generation exceed the model capacity)
	truncate a lot of the prompt and not suitable when generation exceeds the model capacity)

Adding handle_long_generation paramters for text-generation pipeline. #14118

Adding handle_long_generation paramters for text-generation pipeline. #14118

Uh oh!

Conversation

Narsil commented Oct 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Adding `handle_long_generation` paramters for `text-generation` pipeline. #14118

Adding `handle_long_generation` paramters for `text-generation` pipeline. #14118

Narsil commented Oct 22, 2021 •

edited

Loading