Fixing vLLM: Incorrect Generation Results

### Description  
vLLM accelerates generation by **5× on H800**, but the output quality degrades significantly.  

### Observed Issues  
- **Stage 1:** As the sequence length increases, the generated audio **gradually turns into noise** (e.g., after ~30s).  
- **Stage 2:** More **invalid token IDs** are observed when using vLLM.  

### Expected Behavior  
- The generated audio should maintain quality just like huggingface transformers default implentation, regardless of sequence length.  
- No increase in invalid token IDs in Stage 2.  

### Possible Causes  
The issue is likely in the **LM part**, not the audio tokenizer or GAN.  
Potential causes:  
- **Positional encoding** misalignment?  
- **Page attention** inaccurate?  
- **Decoding hyperparameter misalignment?**  

### Steps to Reproduce  
1. A [`vllm`](https://github.com/multimodal-art-projection/YuE/tree/vllm) branch has been created. @hf-lin will adapt reproducible vLLM inference code based on Hugging Face.  
2. A command to compare **vLLM COT (vllm branch)** vs **HF COT (main branch)** implementations will be added here.  @hf-lin 

### Additional Context  
- **YuE System Overview:** We generate lyrics-to-song sequences with interleaved text conditions and audio tokens.  
- **Dual-Token Strategy:**  
  - One token represents **vocal** at the current frame.  
  - One token represents **instrumental accompaniment** at the current frame.  

See system diagram.

<img width="1116" alt="Image" src="https://github.com/user-attachments/assets/5e962497-9c27-4a97-a3d9-e43c5f6afaad" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing vLLM: Incorrect Generation Results #66

Description

Observed Issues

Expected Behavior

Possible Causes

Steps to Reproduce

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fixing vLLM: Incorrect Generation Results #66

Description

Description

Observed Issues

Expected Behavior

Possible Causes

Steps to Reproduce

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions