Skip to content

RAG generate function uses input_ids even when context_input_ids are given. #7871

@LittlePea13

Description

@LittlePea13

Environment info

  • transformers version: 3.3.1
  • Platform: Linux-5.4.0-51-generic-x86_64-with-debian-buster-sid
  • Python version: 3.6.8
  • PyTorch version (GPU?): 1.6.0 (True)
  • Tensorflow version (GPU?): 2.3.1 (False)
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help

I think @patrickvonplaten has been checking RAG issues

Information

Model I am using RagTokenForGeneration:

The problem arises when using:

  • [x ] the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • [x ] my own task or dataset: (give details below)

To reproduce

One can use the demo for RAG currently in PR but it will happen in any case.

  1. Load a RagTokenForGeneration model.
  2. Generate the context_input_ids (in the demo done doing a forward pass.
  3. use the generate function without giving input_ids which is supposed to be an optional input.
  4. The function check the batch_size using input_ids and breaks since they are a None in this line:
    batch_size = input_ids.shape[0]
    query = "My question"
    tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-nq")
    rag_conf = RagConfig.from_pretrained("facebook/rag-token-nq")
    retriever = RagRetriever.from_pretrained("facebook/rag-token-nq", question_encoder_tokenizer = tokenizer.question_encoder, generator_tokenizer = tokenizer.generator, index_name="custom", indexed_dataset=dataset)
    model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)
    device = "cuda:0"
    input_ids = tokenizer(query, return_tensors="pt").input_ids.to(device)
    with torch.no_grad():
        # retrieve support docs
        retrieved_outputs = model(input_ids, labels=None, output_retrieved=True)
        dl_scores = retrieved_outputs.doc_scores[0].tolist()
        dp_scores = retrieved_outputs.doc_scores.softmax(dim=-1)[0].tolist()
        doc_dicts = retriever.index.get_doc_dicts(retrieved_outputs.retrieved_doc_ids)[0]
        support_docs = [
            {"score": ls, "proba": ns, "title": ti, "text": te}
            for ls, ns, ti, te in zip(dl_scores, dp_scores, doc_dicts["title"], doc_dicts["text"])
        ]
        # generate answers
        generated_ids = model.generate(
            context_input_ids=retrieved_outputs.context_input_ids,
            context_attention_mask=retrieved_outputs.context_attention_mask,
            doc_scores=retrieved_outputs.doc_scores,
            num_beams=4,
            num_return_sequences=4,
            min_length=2,
            max_length=64,
            length_penalty=1.0,
        )

Expected behavior

The batch size should be obtained differently. For instance batch_size = doc_scores.shape[0].

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions