Releases: matatonic/openedai-vision
Releases · matatonic/openedai-vision
0.41.0
0.40.0
Version 0.40.0
- new model support: AIDC-AI/Ovis1.6-Llama3.2-3B, AIDC-AI/Ovis1.6-Gemma2-27B
- new model support: BAAI/Aquila-VL-2B-llava-qwen
- new model support: HuggingFaceTB/SmolVLM-Instruct
- new model support: google/paligemma2 family of models (very limited instruct/chat training so far)
- Qwen2-VL: unpin Qwen2-VL-7B & remove Qwen hacks, GTPT-Int4/8 working again (still slow - why?)
- pin bitsandbytes==0.44.1
⚠️ DEPRECATED MODELS (use the0.39.2
docker image for support of these models): internlm-xcomposer2-7b, internlm-xcomposer2-7b-4bit, internlm-xcomposer2-vl-1_8b, internlm-xcomposer2-vl-7b, internlm-xcomposer2-vl-7b-4bit, nvidia/NVLM-D-72B, Llama-3-8B-Dragonfly-Med-v1, Llama-3-8B-Dragonfly-v1
0.39.2
0.39.1
0.39.0
Version 0.39.0
- new model support: rhymes-ai/Aria
- improved support for multi-image in various models.
- docker package: The latest release will now be tagged with
:latest
, rather than latest commit. ⚠️ docker: docker will now run as a user instead of root. Yourhf_home
volume may need the ownership fixed, you can use this command:sudo chown $(id -u):$(id -g) -R hf_home
0.38.0
0.36.0
0.35.0
Recent updates
Version 0.35.0
- Update Molmo (tensorflow-cpu no longer required), and add autocast for faster, smaller types than float32.
- New option:
--use-double-quant
to enable double quantization with--load-in-4bit
, a little slower for a little less VRAM. - Molmo 72B will now run in under 48GB of vram using
--load-in-4bit --use-double-quant
. - Add
completion_tokens
counts in API and logged tokens/s for most results, other compatibility improvements - Include sample tokens/s data (A100) in
vision.sample.env
0.34.0
Recent updates
Version 0.34.0
- new model support: Meta-llama: Llama-3.2-11B-Vision-Instruct, Llama-3.2-90B-Vision-Instruct
- new model support: Ai2/allenai Molmo family of models (requires additional
pip install tensorflow-cpu
for now, see note) - new model support: stepfun-ai/GOT-OCR2_0, this is an OCR only model, all chat is ignored.
- Support moved to alt image: Bunny-Llama-3-8B-V, Bunny-v1_1-Llama-3-8B-V, Mantis-8B-clip-llama3, Mantis-8B-siglip-llama3, omchat-v2.0-13B-single-beta_hf, qihoo360/360VL-8B
0.33.0
Recent updates
Version 0.33.0
- new model support: mx262/MiniMonkey, thanks @white2018
- Fix Qwen2-VL when used with Qwen-Agent and multiple system prompts (tools), thanks @cedonley
- idefics2-8b support moved to alt image
- pin Qwen2-VL-7B-Instruct-AWQ revision, see note for info