RAG generate function uses input_ids even when context_input_ids are given.

## Environment info

     
- `transformers` version: 3.3.1
- Platform: Linux-5.4.0-51-generic-x86_64-with-debian-buster-sid
- Python version: 3.6.8
- PyTorch version (GPU?): 1.6.0 (True)
- Tensorflow version (GPU?): 2.3.1 (False)
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

### Who can help
 
I think @patrickvonplaten has been checking RAG issues


## Information

Model I am using RagTokenForGeneration:

The problem arises when using:
* [x ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)

The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [x ] my own task or dataset: (give details below)

## To reproduce

One can use the demo for RAG [currently in PR](https://github.com/huggingface/transformers/pull/7455) but it will happen in any case.

1. Load a RagTokenForGeneration model.
2. Generate the context_input_ids (in the demo done doing a forward pass.
3. use the generate function without giving `input_ids` which is supposed to be an optional input.
4. The function check the `batch_size` using `input_ids` and breaks since they are a None in this line: https://github.com/huggingface/transformers/blob/9f7b2b243230a0ff7e61f48f852e3a5b6f6d86fa/src/transformers/modeling_rag.py#L1311



```
    query = "My question"
    tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-nq")
    rag_conf = RagConfig.from_pretrained("facebook/rag-token-nq")
    retriever = RagRetriever.from_pretrained("facebook/rag-token-nq", question_encoder_tokenizer = tokenizer.question_encoder, generator_tokenizer = tokenizer.generator, index_name="custom", indexed_dataset=dataset)
    model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)
    device = "cuda:0"
    input_ids = tokenizer(query, return_tensors="pt").input_ids.to(device)
    with torch.no_grad():
        # retrieve support docs
        retrieved_outputs = model(input_ids, labels=None, output_retrieved=True)
        dl_scores = retrieved_outputs.doc_scores[0].tolist()
        dp_scores = retrieved_outputs.doc_scores.softmax(dim=-1)[0].tolist()
        doc_dicts = retriever.index.get_doc_dicts(retrieved_outputs.retrieved_doc_ids)[0]
        support_docs = [
            {"score": ls, "proba": ns, "title": ti, "text": te}
            for ls, ns, ti, te in zip(dl_scores, dp_scores, doc_dicts["title"], doc_dicts["text"])
        ]
        # generate answers
        generated_ids = model.generate(
            context_input_ids=retrieved_outputs.context_input_ids,
            context_attention_mask=retrieved_outputs.context_attention_mask,
            doc_scores=retrieved_outputs.doc_scores,
            num_beams=4,
            num_return_sequences=4,
            min_length=2,
            max_length=64,
            length_penalty=1.0,
        )
```

## Expected behavior



The batch size should be obtained differently. For instance `batch_size = doc_scores.shape[0]`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG generate function uses input_ids even when context_input_ids are given. #7871

Environment info

Who can help

Information

To reproduce

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RAG generate function uses input_ids even when context_input_ids are given. #7871

Description

Environment info

Who can help

Information

To reproduce

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions