Enable qwen2vl video #2756

drbh · 2024-11-18T17:59:01Z

This PR is a work in progress that explores adding support for video inputs with Qwen2-VL. Thank you @mfarre for getting this effort started.

TODOS

suport video_urls
fetch video contents in router
update protobufs to support video chunks
handle padding video token inputs
tokenize video bytes
integrate video logic with vision model (update position ids)
ensure tokenization process is correct
add tests
refactor/improve

update*

start server

text-generation-launcher \
--model-id Qwen/Qwen2-VL-7B-Instruct \
--max-batch-prefill-tokens 10000 \
--max-input-tokens 10000 \
--max-total-tokens 10001

send request

import requests
import json

def chat_completion(url="http://127.0.0.1:3000", video_url=None, prompt=None):
    messages = [{
        "role": "user",
        "content": [
            {
                "type": "video_url",
                "video_url": { 
                    "url": video_url
                }
            },
            {
                "type": "text",
                "text": prompt
            }
        ]
    }]

    payload = {
        "messages": messages,
        "seed": 42,
        "max_tokens": 30
    }

    response = requests.post(
        f"{url}/v1/chat/completions",
        json=payload,
        headers={"Content-Type": "application/json"}
    )

    return response.json()

video_url = "https://test-videos.co.uk/vids/bigbuckbunny/mp4/h264/360/Big_Buck_Bunny_360_10s_1MB.mp4"
result = chat_completion(
    video_url=video_url,
    prompt="Describe this video."
)
print(json.dumps(result, indent=2))
# {
#     "object": "chat.completion",
#     "id": "",
#     "created": 1731964042,
#     "model": "Qwen/Qwen2-VL-7B-Instruct",
#     "system_fingerprint": "2.4.1-dev0-native",
#     "choices": [
#         {
#             "index": 0,
#             "message": {
#                 "role": "assistant",
#                 "content": "The video showcases lush green trees with vibrant shades of green and various shades of yellow and brown, as well as moss-covered stumps and piles of moss",
#             },
#             "logprobs": null,
#             "finish_reason": "length",
#         }
#     ],
#     "usage": {"prompt_tokens": 9593, "completion_tokens": 30, "total_tokens": 9623},
# }

HuggingFaceDocBuilderDev · 2024-11-18T21:17:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

router/src/validation.rs

… frames

…r in docker and doc bump

drbh force-pushed the enable-qwen2vl-video branch from b780f00 to 6b4697e Compare November 18, 2024 18:03

mfarre reviewed Nov 20, 2024

View reviewed changes

router/src/validation.rs Show resolved Hide resolved

drbh force-pushed the enable-qwen2vl-video branch from b9707b9 to 32438fc Compare November 25, 2024 21:43

drbh force-pushed the enable-qwen2vl-video branch from 2ef3038 to 17b27d4 Compare December 3, 2024 00:54

drbh force-pushed the enable-qwen2vl-video branch 4 times, most recently from 4e921bf to 93a2413 Compare December 18, 2024 01:41

mfarre and others added 21 commits December 23, 2024 13:47

WIP video support

18c9f06

router changes

7c67939

adopting video url

5ced960

connecting video to qwen2

05464d2

fix

c7c2fda

downloading videos

b9c8152

fix

464609f

refactoring

a25c3ec

fix

3c07391

feat: support video input chunks and enable qwen2 vl to process video

b2c5575

fix: add protobuf update and mp4parse dep

83a7f18

fix: remove unused deps and imports

322165d

moving video sampling and resize to validation. downstream we receive…

e65ead1

… frames

flatten frames to data block when needed

36e095b

fix: adjust video process, reduce to 1 fps and adjust tensor shape

bc5e202

fix: adjust deps after rebase

1afaa69

feat: adjust impure shell deps and autodocs workflow

16007b6

fix: include more deps for ffmpeg as docs suggest

39fac7e

fix: add ffmpeg deps to test build

b508b10

fix: debug ffmpeg install in tests workflow

4a3a724

fix: debug ffmpeg deps in tests II

ac7483c

drbh and others added 29 commits December 23, 2024 13:47

fix: adjust pkg config in test

daf83a9

fix: ensure pip is installed after installing deps in test workflow

d5cc670

fix: add libavfilter dep to test

4a76e8b

fix: add libavdevice dep to tests workflow

f0c3841

fix: add ffmpeg overlay and enable build

96968a0

fix: include ffmpeg deps in autodocs workflow

167c6f0

Cleanup impure Nix shell

98392a7

Make the pure build work

05004a6

Fix test devshell

063104c

fix: bump deps in other dockerfiles

2dc078a

fix: add ffmpeg to final layer of container

50b5399

fix: include usr lib in ld path

b5b2184

fix: copy shared libraries from builder

75ab887

installing ssl requirements prior to rust building stage

cbf1d98

fixing ssl issue

af77a0c

working version

19e1c8d

cleanup prints

db97d97

fix: pre commit and clippy lints

71ed75a

fix: resolve rebase issues and add test

e2b75a5

fix: remove unnecessary cast

1d6bf24

fix: update all vlm forward args, pass shared libraries to final laye…

2ae152a

…r in docker and doc bump

fix: adjust batch_tokenized_inputs output in mllama

5c7bc91

fix: update lints after rebase

bb00fb3

fix: update trtllm dockefile after rebase

91ed362

fix: adjust whitespace lint

5322abd

fix: feature flag video and remove from non cuda dockerfiles

b4da6ad

fix: make ffmpeg-next dep optional with feature

27f758d

fix: include the video feature in cargo chef command

4f42d0c

fix: adjust trtllm looper for video chunk enum

dcc1194

drbh force-pushed the enable-qwen2vl-video branch from 93a2413 to dcc1194 Compare December 23, 2024 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable qwen2vl video #2756

Enable qwen2vl video #2756

drbh commented Nov 18, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 18, 2024

Enable qwen2vl video #2756

Are you sure you want to change the base?

Enable qwen2vl video #2756

Conversation

drbh commented Nov 18, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Nov 18, 2024

drbh commented Nov 18, 2024 •

edited

Loading