feat: tokenize each request individually and increase warmup image size #2802

drbh · 2024-12-05T16:12:19Z

This PR resolves some small issues with qwen2-vl.

doubles the size of WARMUP_IMAGE_BASE64 from 20x20px to 40x40px (meets qwens minimal requirement without hacky fix)
removes hacky fix to double the warmup image
prefer tokenizing each request instead of the whole batch at once. this change allows r.truncate to be passed for each request - as previouslly it was not respected when one of the request was smaller than others in the batch.
sets max_s to the max of max_s or the input size. This is required so the rotary and create self._cos_cached of the correct size in relation to the position ids.

these changes resolve a startup issue reproducible with:

text-generation-launcher \
--model-id Qwen/Qwen2-VL-2B-Instruct \
--max-input-tokens 40 \
--max-batch-prefill-tokens 50 \
--max-total-tokens 51

*(note the underlying issue triggers when max-input-tokens is less than max-batch-prefill-tokens)

Narsil · 2024-12-09T18:47:19Z

from 20x20px to 40x40px (meets qwens minimal requirement without hacky fix)

I do not understand, why should we impose anything on the user for the images. If 20px x 20x is not supported we should:

Rescale the image seemlessly and correctly infer on it.
Reject the image with a proper error message.
User's shouldn't have to know anything about the model's internals, 20x20px should be ok imho.

drbh force-pushed the improve-qwen2-vl-warmup branch from 066addd to 60b9c18 Compare December 6, 2024 17:40

drbh mentioned this pull request Dec 9, 2024

Attempt for cleverer auto batch_prefill values (some simplifications). #2808

Merged

5 tasks

drbh added 2 commits December 9, 2024 15:21

feat: tokenize each request individually and increase warmup image size

fd4de85

feat: adjust rotary embed and avoid cuda graphs of size 2 and smaller

3cc8297

drbh force-pushed the improve-qwen2-vl-warmup branch from 60b9c18 to 3cc8297 Compare December 9, 2024 20:31

fix: address image resize and rebase changes

a3049f1

drbh force-pushed the improve-qwen2-vl-warmup branch from 32a9564 to a3049f1 Compare December 9, 2024 21:32

drbh requested a review from Narsil December 16, 2024 15:54

Provide feedback