Skip to content

[BUG] [vLLM] Batch truncation bug: context length check uses first prompt instead of longest #1204

@paulovsantanas

Description

@paulovsantanas

Describe the bug

When running batched generation with VLLMModel._greedy_until, context-length checks were based only on the first prompt in the batch (len(inputs[0])) instead of the longest prompt.
If the first prompt was short but another prompt in the same batch was longer, truncation could be skipped incorrectly, causing some samples to exceed max_length.

To Reproduce

  1. Use lighteval with the vLLM backend and configure a finite max_length.
  2. Create a batch with prompts of different lengths, where:
    • first prompt is short,
    • at least one later prompt is long enough that prompt_len + max_new_tokens > max_length.
  3. Run a generation call that reaches _greedy_until (e.g. a normal evaluation batch with max_new_tokens set).
  4. Observe that truncation/logging decisions are made from the first prompt length, so longer prompts in the same batch may not be truncated as required.

Minimal example logic (conceptual):

  • max_length = 1024
  • max_new_tokens = 200
  • prompt lengths in same batch: [100, 950]
  • old behavior checks 100 + 200, decides no truncation, but second sample actually needs truncation (950 + 200 > 1024).

Expected behavior

Truncation decisions should use the worst-case prompt length in the batch (the maximum prompt length), so all samples remain within max_length.
Warnings should clearly indicate batch-aware length handling.

Version info

  • lighteval main/33acf35f02c41d234c7df5cbdf1fd3e9d33ecd76

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions