You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
transformers version 4.47.1
Google colab
Python 3.10.12
I attempted to use Flash Attention with the Janus-1.3B model, but encountered the following error:
ValueError: MultiModalityCausalLM does not support Flash Attention 2.0 yet.
This error was raised by the transformers/modeling_utils.py file:
if not cls._supports_flash_attn_2:
raise ValueError(
f"{cls.__name__} does not support Flash Attention 2.0 yet. Please request to add support where"
f" the model is hosted, on its model hub page: https://huggingface.co/{config._name_or_path}/discussions/new"
" or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new"
)
Installed FlashAttention-2 using the command: pip install flash-attn --no-build-isolation
Here is the code I used:
import torch
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor
from janus.utils.io import load_pil_images
# specify the path to the model
model_path = "deepseek-ai/Janus-1.3B"
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer
vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(
model_path, trust_remote_code=True, attn_implementation="flash_attention_2"
)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
conversation = [
{
"role": "User",
"content": "<image_placeholder>\nConvert the formula into latex code.",
"images": ["images/equation.png"],
},
{"role": "Assistant", "content": ""},
]
# load images and prepare for inputs
pil_images = load_pil_images(conversation)
prepare_inputs = vl_chat_processor(
conversations=conversation, images=pil_images, force_batchify=True
).to(vl_gpt.device)
# # run image encoder to get the image embeddings
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)
# # run the model to get the response
outputs = vl_gpt.language_model.generate(
inputs_embeds=inputs_embeds,
attention_mask=prepare_inputs.attention_mask,
pad_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=512,
do_sample=False,
use_cache=True,
)
answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
print(f"{prepare_inputs['sft_format'][0]}", answer)
The text was updated successfully, but these errors were encountered:
System Info
transformers version 4.47.1
Google colab
Python 3.10.12
I attempted to use Flash Attention with the Janus-1.3B model, but encountered the following error:
ValueError: MultiModalityCausalLM does not support Flash Attention 2.0 yet.
This error was raised by the transformers/modeling_utils.py file:
Installed FlashAttention-2 using the command:
pip install flash-attn --no-build-isolation
Here is the code I used:
The text was updated successfully, but these errors were encountered: