Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable qwen2vl video #2756

Draft
wants to merge 51 commits into
base: main
Choose a base branch
from
Draft

Enable qwen2vl video #2756

wants to merge 51 commits into from

Conversation

drbh
Copy link
Collaborator

@drbh drbh commented Nov 18, 2024

This PR is a work in progress that explores adding support for video inputs with Qwen2-VL. Thank you @mfarre for getting this effort started.

TODOS

  • suport video_urls
  • fetch video contents in router
  • update protobufs to support video chunks
  • handle padding video token inputs
  • tokenize video bytes
  • integrate video logic with vision model (update position ids)
  • ensure tokenization process is correct
  • add tests
  • refactor/improve

update*

start server

text-generation-launcher \
--model-id Qwen/Qwen2-VL-7B-Instruct \
--max-batch-prefill-tokens 10000 \
--max-input-tokens 10000 \
--max-total-tokens 10001

send request

import requests
import json

def chat_completion(url="http://127.0.0.1:3000", video_url=None, prompt=None):
    messages = [{
        "role": "user",
        "content": [
            {
                "type": "video_url",
                "video_url": { 
                    "url": video_url
                }
            },
            {
                "type": "text",
                "text": prompt
            }
        ]
    }]

    payload = {
        "messages": messages,
        "seed": 42,
        "max_tokens": 30
    }

    response = requests.post(
        f"{url}/v1/chat/completions",
        json=payload,
        headers={"Content-Type": "application/json"}
    )

    return response.json()

video_url = "https://test-videos.co.uk/vids/bigbuckbunny/mp4/h264/360/Big_Buck_Bunny_360_10s_1MB.mp4"
result = chat_completion(
    video_url=video_url,
    prompt="Describe this video."
)
print(json.dumps(result, indent=2))
# {
#     "object": "chat.completion",
#     "id": "",
#     "created": 1731964042,
#     "model": "Qwen/Qwen2-VL-7B-Instruct",
#     "system_fingerprint": "2.4.1-dev0-native",
#     "choices": [
#         {
#             "index": 0,
#             "message": {
#                 "role": "assistant",
#                 "content": "The video showcases lush green trees with vibrant shades of green and various shades of yellow and brown, as well as moss-covered stumps and piles of moss",
#             },
#             "logprobs": null,
#             "finish_reason": "length",
#         }
#     ],
#     "usage": {"prompt_tokens": 9593, "completion_tokens": 30, "total_tokens": 9623},
# }

@drbh drbh force-pushed the enable-qwen2vl-video branch from b780f00 to 6b4697e Compare November 18, 2024 18:03
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@drbh drbh force-pushed the enable-qwen2vl-video branch from b9707b9 to 32438fc Compare November 25, 2024 21:43
@drbh drbh force-pushed the enable-qwen2vl-video branch from 2ef3038 to 17b27d4 Compare December 3, 2024 00:54
@drbh drbh force-pushed the enable-qwen2vl-video branch 4 times, most recently from 4e921bf to 93a2413 Compare December 18, 2024 01:41
@drbh drbh force-pushed the enable-qwen2vl-video branch from 93a2413 to dcc1194 Compare December 23, 2024 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants