Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM inference on Lunar Lake (LNL) iGPU is not working #1358

Open
azhuvath opened this issue Dec 10, 2024 · 8 comments
Open

LLM inference on Lunar Lake (LNL) iGPU is not working #1358

azhuvath opened this issue Dec 10, 2024 · 8 comments
Assignees

Comments

@azhuvath
Copy link

Tried the sample openvino-genai code for iGPU and it not working. It is not working for NPU also. The below code exits without any exception. It works fine with CPU.

I was following the blog https://medium.com/openvino-toolkit/how-to-run-llama-3-2-locally-with-openvino-60a0f3674549 . It uses a dGPU and not iGPU. Not sure if it is not supported for iGPU.

device = 'GPU'
pipe = openvino_genai.LLMPipeline(args.model_dir, device)

I tried the INT4 model on iGPU. It was not working and hence created the INT8 model using the below command.

optimum-cli export openvino --model meta-llama/Llama-3.2-3B-Instruct --task text-generation-with-past --weight-format int8 --group-size 64 --ratio 1.0 --sym --all-layers llama-3.2-3b-instruct-INT8

Package Details

openvino==2025.0.0.dev20241209
openvino-genai==2025.0.0.0.dev20241209
openvino-telemetry==2024.5.0
openvino-tokenizers==2025.0.0.0.dev20241209
optimum==1.23.3
optimum-intel @ git+https://github.com/huggingface/optimum-intel.git@02835ce8833b5e2b67ba1a87bf85b0739335ac4d

@Wan-Intel
Copy link

Could you please share the error messages that you encountered when using GPU plugin?

On another note, could you please run the following command as share the output with us?
python3 -c "from openvino import Core; print(Core().available_devices)"

@azhuvath
Copy link
Author

python3 -c "from openvino import Core; print(Core().available_devices)"

I do not get any error or exception. The code exits at the line
pipe = openvino_genai.LLMPipeline(args.model_dir, device)

The output requested for device supported is
(ov_env) PS C:\Users\devcloud\llama3.2> python -c "from openvino import Core; print(Core().available_devices)"
['CPU', 'GPU', 'NPU']

@Wan-Intel
Copy link

Thanks for providing the information with us.

Could you please provide the additional information with me?

  • Hardware specification
  • Host Operating System

@azhuvath
Copy link
Author

Thanks for providing the information with us.

Could you please provide the additional information with me?

  • Hardware specification
  • Host Operating System

Output of command systeminfo | findstr /B /C:"OS Name" /B /C:"OS Version"

OS Name: Microsoft Windows 11 Pro
OS Version: 10.0.26100 N/A Build 26100

CPU

Intel(R) Core(TM) Ultra 9 288V
CPU SUPPORTED_PROPERTIES:

AVAILABLE_DEVICES : ['']
RANGE_FOR_ASYNC_INFER_REQUESTS : (1, 1, 1)
RANGE_FOR_STREAMS : (1, 8)
EXECUTION_DEVICES : ['CPU']
FULL_DEVICE_NAME : Intel(R) Core(TM) Ultra 9 288V
OPTIMIZATION_CAPABILITIES : ['BF16', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
DEVICE_TYPE : Type.INTEGRATED
DEVICE_ARCHITECTURE : intel64
NUM_STREAMS : 1
INFERENCE_NUM_THREADS : 0
PERF_COUNT : False
INFERENCE_PRECISION_HINT : <Type: 'float32'>
PERFORMANCE_HINT : PerformanceMode.LATENCY
EXECUTION_MODE_HINT : ExecutionMode.PERFORMANCE
PERFORMANCE_HINT_NUM_REQUESTS : 0
ENABLE_CPU_PINNING : True
SCHEDULING_CORE_TYPE : SchedulingCoreType.ANY_CORE
MODEL_DISTRIBUTION_POLICY : set()
ENABLE_HYPER_THREADING : True
DEVICE_ID :
CPU_DENORMALS_OPTIMIZATION : False
LOG_LEVEL : Level.NO
CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
DYNAMIC_QUANTIZATION_GROUP_SIZE : 32
KV_CACHE_PRECISION : <Type: 'uint8_t'>
AFFINITY : Affinity.HYBRID_AWARE
['CPU', 'GPU', 'NPU']
Intel(R) Core(TM) Ultra 9 288V
CPU SUPPORTED_PROPERTIES:

AVAILABLE_DEVICES : ['']
RANGE_FOR_ASYNC_INFER_REQUESTS : (1, 1, 1)
RANGE_FOR_STREAMS : (1, 8)
EXECUTION_DEVICES : ['CPU']
FULL_DEVICE_NAME : Intel(R) Core(TM) Ultra 9 288V
OPTIMIZATION_CAPABILITIES : ['BF16', 'FP32', 'FP16', 'INT8', 'BIN', 'EXPORT_IMPORT']
DEVICE_TYPE : Type.INTEGRATED
DEVICE_ARCHITECTURE : intel64
NUM_STREAMS : 1
INFERENCE_NUM_THREADS : 0
PERF_COUNT : False
INFERENCE_PRECISION_HINT : <Type: 'float32'>
PERFORMANCE_HINT : PerformanceMode.LATENCY
EXECUTION_MODE_HINT : ExecutionMode.PERFORMANCE
PERFORMANCE_HINT_NUM_REQUESTS : 0
ENABLE_CPU_PINNING : True
SCHEDULING_CORE_TYPE : SchedulingCoreType.ANY_CORE
MODEL_DISTRIBUTION_POLICY : set()
ENABLE_HYPER_THREADING : True
DEVICE_ID :
CPU_DENORMALS_OPTIMIZATION : False
LOG_LEVEL : Level.NO
CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
DYNAMIC_QUANTIZATION_GROUP_SIZE : 32
KV_CACHE_PRECISION : <Type: 'uint8_t'>
AFFINITY : Affinity.HYBRID_AWARE

iGPU

Intel(R) Arc(TM) 140V GPU (16GB) (iGPU)
GPU SUPPORTED_PROPERTIES:

AVAILABLE_DEVICES : ['0']
RANGE_FOR_ASYNC_INFER_REQUESTS : (1, 2, 1)
RANGE_FOR_STREAMS : (1, 2)
OPTIMAL_BATCH_SIZE : 1
MAX_BATCH_SIZE : 1
DEVICE_ARCHITECTURE : GPU: vendor=0x8086 arch=v20.4.4
FULL_DEVICE_NAME : Intel(R) Arc(TM) 140V GPU (16GB) (iGPU)
DEVICE_UUID : 8680a064040000000002000000000000
DEVICE_LUID : 94c0000000000000
DEVICE_TYPE : Type.INTEGRATED
DEVICE_GOPS : {<Type: 'float16'>: 0.0, <Type: 'float32'>: 2099.199951171875, <Type: 'int8_t'>: 0.0, <Type: 'uint8_t'>: 0.0}
OPTIMIZATION_CAPABILITIES : ['FP32', 'BIN', 'FP16', 'INT8', 'GPU_HW_MATMUL', 'EXPORT_IMPORT']
GPU_DEVICE_TOTAL_MEM_SIZE : 17721700352
GPU_UARCH_VERSION : 20.4.4
GPU_EXECUTION_UNITS_COUNT : 64
GPU_MEMORY_STATISTICS : {}
PERF_COUNT : False
MODEL_PRIORITY : Priority.MEDIUM
GPU_HOST_TASK_PRIORITY : Priority.MEDIUM
GPU_QUEUE_PRIORITY : Priority.MEDIUM
GPU_QUEUE_THROTTLE : Priority.MEDIUM
GPU_ENABLE_SDPA_OPTIMIZATION : True
GPU_ENABLE_LOOP_UNROLLING : True
GPU_DISABLE_WINOGRAD_CONVOLUTION: False
CACHE_DIR :
CACHE_MODE : CacheMode.OPTIMIZE_SPEED
PERFORMANCE_HINT : PerformanceMode.LATENCY
EXECUTION_MODE_HINT : ExecutionMode.PERFORMANCE
COMPILATION_NUM_THREADS : 8
NUM_STREAMS : 1
PERFORMANCE_HINT_NUM_REQUESTS : 0
INFERENCE_PRECISION_HINT : <Type: 'float16'>
ENABLE_CPU_PINNING : False
DEVICE_ID : 0
DYNAMIC_QUANTIZATION_GROUP_SIZE : 0
ACTIVATIONS_SCALE_FACTOR : 0.0
WEIGHTS_PATH :
['CPU', 'GPU', 'NPU']
Intel(R) Arc(TM) 140V GPU (16GB) (iGPU)
GPU SUPPORTED_PROPERTIES:

AVAILABLE_DEVICES : ['0']
RANGE_FOR_ASYNC_INFER_REQUESTS : (1, 2, 1)
RANGE_FOR_STREAMS : (1, 2)
OPTIMAL_BATCH_SIZE : 1
MAX_BATCH_SIZE : 1
DEVICE_ARCHITECTURE : GPU: vendor=0x8086 arch=v20.4.4
FULL_DEVICE_NAME : Intel(R) Arc(TM) 140V GPU (16GB) (iGPU)
DEVICE_UUID : 8680a064040000000002000000000000
DEVICE_LUID : 94c0000000000000
DEVICE_TYPE : Type.INTEGRATED
DEVICE_GOPS : {<Type: 'float16'>: 0.0, <Type: 'float32'>: 2099.199951171875, <Type: 'int8_t'>: 0.0, <Type: 'uint8_t'>: 0.0}
OPTIMIZATION_CAPABILITIES : ['FP32', 'BIN', 'FP16', 'INT8', 'GPU_HW_MATMUL', 'EXPORT_IMPORT']
GPU_DEVICE_TOTAL_MEM_SIZE : 17721700352
GPU_UARCH_VERSION : 20.4.4
GPU_EXECUTION_UNITS_COUNT : 64
GPU_MEMORY_STATISTICS : {}
PERF_COUNT : False
MODEL_PRIORITY : Priority.MEDIUM
GPU_HOST_TASK_PRIORITY : Priority.MEDIUM
GPU_QUEUE_PRIORITY : Priority.MEDIUM
GPU_QUEUE_THROTTLE : Priority.MEDIUM
GPU_ENABLE_SDPA_OPTIMIZATION : True
GPU_ENABLE_LOOP_UNROLLING : True
GPU_DISABLE_WINOGRAD_CONVOLUTION: False
CACHE_DIR :
CACHE_MODE : CacheMode.OPTIMIZE_SPEED
PERFORMANCE_HINT : PerformanceMode.LATENCY
EXECUTION_MODE_HINT : ExecutionMode.PERFORMANCE
COMPILATION_NUM_THREADS : 8
NUM_STREAMS : 1
PERFORMANCE_HINT_NUM_REQUESTS : 0
INFERENCE_PRECISION_HINT : <Type: 'float16'>
ENABLE_CPU_PINNING : False
DEVICE_ID : 0
DYNAMIC_QUANTIZATION_GROUP_SIZE : 0
ACTIVATIONS_SCALE_FACTOR : 0.0
WEIGHTS_PATH :

NPU

Intel(R) AI Boost
NPU SUPPORTED_PROPERTIES:

AVAILABLE_DEVICES : ['4000']
CACHE_DIR :
COMPILATION_NUM_THREADS : 8
DEVICE_ARCHITECTURE : 4000
DEVICE_GOPS : {<Type: 'bfloat16'>: 0.0, <Type: 'float16'>: 23961.6015625, <Type: 'float32'>: 0.0, <Type: 'int8_t'>: 47923.203125, <Type: 'uint8_t'>: 47923.203125}
DEVICE_ID :
DEVICE_PCI_INFO : {domain: 0 bus: 0 device: 0xb function: 0}
DEVICE_TYPE : Type.INTEGRATED
DEVICE_UUID : 80d1d11eb73811eab3de0242ac130004
ENABLE_CPU_PINNING : False
EXECUTION_DEVICES : NPU
EXECUTION_MODE_HINT : ExecutionMode.PERFORMANCE
FULL_DEVICE_NAME : Intel(R) AI Boost
INFERENCE_PRECISION_HINT : <Type: 'float16'>
LOG_LEVEL : Level.ERR
MODEL_PRIORITY : Priority.MEDIUM
NPU_BYPASS_UMD_CACHING : False
NPU_COMPILATION_MODE_PARAMS :
NPU_DEFER_WEIGHTS_LOAD : False
NPU_DEVICE_ALLOC_MEM_SIZE : 0
NPU_DEVICE_TOTAL_MEM_SIZE : 17179869184
NPU_DRIVER_VERSION : 3104
NPU_MAX_TILES : 6
NPU_TILES : -1
NPU_TURBO : False
NUM_STREAMS : 1
OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1
OPTIMIZATION_CAPABILITIES : ['FP16', 'INT8', 'EXPORT_IMPORT']
PERFORMANCE_HINT : PerformanceMode.LATENCY
PERFORMANCE_HINT_NUM_REQUESTS : 1
PERF_COUNT : False
RANGE_FOR_ASYNC_INFER_REQUESTS : (1, 10, 1)
RANGE_FOR_STREAMS : (1, 4)
WORKLOAD_TYPE : WorkloadType.DEFAULT
['CPU', 'GPU', 'NPU']
Intel(R) AI Boost
NPU SUPPORTED_PROPERTIES:

AVAILABLE_DEVICES : ['4000']
CACHE_DIR :
COMPILATION_NUM_THREADS : 8
DEVICE_ARCHITECTURE : 4000
DEVICE_GOPS : {<Type: 'bfloat16'>: 0.0, <Type: 'float16'>: 23961.6015625, <Type: 'float32'>: 0.0, <Type: 'int8_t'>: 47923.203125, <Type: 'uint8_t'>: 47923.203125}
DEVICE_ID :
DEVICE_PCI_INFO : {domain: 0 bus: 0 device: 0xb function: 0}
DEVICE_TYPE : Type.INTEGRATED
DEVICE_UUID : 80d1d11eb73811eab3de0242ac130004
ENABLE_CPU_PINNING : False
EXECUTION_DEVICES : NPU
EXECUTION_MODE_HINT : ExecutionMode.PERFORMANCE
FULL_DEVICE_NAME : Intel(R) AI Boost
INFERENCE_PRECISION_HINT : <Type: 'float16'>
LOG_LEVEL : Level.ERR
MODEL_PRIORITY : Priority.MEDIUM
NPU_BYPASS_UMD_CACHING : False
NPU_COMPILATION_MODE_PARAMS :
NPU_DEFER_WEIGHTS_LOAD : False
NPU_DEVICE_ALLOC_MEM_SIZE : 0
NPU_DEVICE_TOTAL_MEM_SIZE : 17179869184
NPU_DRIVER_VERSION : 3104
NPU_MAX_TILES : 6
NPU_TILES : -1
NPU_TURBO : False
NUM_STREAMS : 1
OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1
OPTIMIZATION_CAPABILITIES : ['FP16', 'INT8', 'EXPORT_IMPORT']
PERFORMANCE_HINT : PerformanceMode.LATENCY
PERFORMANCE_HINT_NUM_REQUESTS : 1
PERF_COUNT : False
RANGE_FOR_ASYNC_INFER_REQUESTS : (1, 10, 1)
RANGE_FOR_STREAMS : (1, 4)
WORKLOAD_TYPE : WorkloadType.DEFAULT

@Wan-Intel
Copy link

Thanks for the information. I'll escalate this to relevant team, and we'll update you as soon as possible.

@Wan-Intel Wan-Intel added the PSE label Dec 12, 2024
@avitial
Copy link

avitial commented Dec 26, 2024

Ref. 159930

@p-durandin
Copy link

@azhuvath please send the error messages for Int4 and int8 models

@azhuvath
Copy link
Author

I do not get any error or exception. The code exits at the line
pipe = openvino_genai.LLMPipeline(args.model_dir, device)

@azhuvath please send the error messages for Int4 and int8 models

I do not get any error or exception. The code exits at the line
pipe = openvino_genai.LLMPipeline(args.model_dir, device)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants