question about configuration #1643

menglin0320 · 2024-06-28T19:27:17Z

In the examples you guys didn't mention how to specify parameters like batch size, max input length etc.
My first question is how to change the max input length, I tried the llama2 example for a RAG usage case. llama2 should be able to handle 4096 input tokens but it's limited to 1024 for some reason.
Similarly though I don't feel batching is a good idea on cpu, I still want to try batched inference with this package. is there a document for how to configure those things?

menglin0320 · 2024-06-28T20:19:12Z

after trying mistral out, yeah you guys limit the ctx length to 1024 for every model.

a32543254 · 2024-07-02T09:56:47Z

could you tell us which example you are using?

menglin0320 · 2024-07-08T14:53:03Z

from transformers import AutoTokenizer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM

# Specify the GGUF repo on the Hugginface
model_name = "TheBloke/Llama-2-7B-Chat-GGUF"
# Download the the specific gguf model file from the above repo
gguf_file = "llama-2-7b-chat.Q4_0.gguf"
# make sure you are granted to access this model on the Huggingface.
tokenizer_name = "meta-llama/Llama-2-7b-chat-hf"
prompt = "Once upon a time, there existed a little girl,"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids

model = AutoModelForCausalLM.from_pretrained(model_name, gguf_file = gguf_file)
outputs = model.generate(inputs)

This one, and I believe that the input sequence length is limited to 1024 by default.
It's hard to know the arguments for "from_pretrained" and "model.generate" with current code.

NeoZhangJianyu assigned a32543254 Jul 17, 2024

NeoZhangJianyu added the documentation Improvements or additions to documentation label Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about configuration #1643

question about configuration #1643

menglin0320 commented Jun 28, 2024

menglin0320 commented Jun 28, 2024

a32543254 commented Jul 2, 2024 •

edited

Loading

menglin0320 commented Jul 8, 2024

question about configuration #1643

question about configuration #1643

Comments

menglin0320 commented Jun 28, 2024

menglin0320 commented Jun 28, 2024

a32543254 commented Jul 2, 2024 • edited Loading

menglin0320 commented Jul 8, 2024

a32543254 commented Jul 2, 2024 •

edited

Loading