Skip to content

Commit

Permalink
Added gemma specific fp8 quantization file (#1445)
Browse files Browse the repository at this point in the history
  • Loading branch information
yeonsily authored and regisss committed Oct 22, 2024
1 parent fc54347 commit 058e91c
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 0 deletions.
25 changes: 25 additions & 0 deletions examples/text-generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -451,6 +451,31 @@ QUANT_CONFIG=./quantization_config/maxabs_quant_phi.json python run_generation.p
--reuse_cache
```

Here is an example to measure the tensor quantization statistics on gemma with 1 card:

```bash
QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_generation.py \
--model_name_or_path google/gemma-7b \
--use_hpu_graphs \
--use_kv_cache \
--max_new_tokens 100 \
--batch_size 1 \
--reuse_cache \
--bf16
```

Here is an example to quantize the model based on previous measurements for gemma with 1 card:
```bash
QUANT_CONFIG=./quantization_config/maxabs_quant_gemma.json python run_generation.py \
--model_name_or_path google/gemma-7b \
--use_hpu_graphs \
--use_kv_cache \
--max_new_tokens 100 \
--batch_size 1 \
--reuse_cache \
--bf16
```


### Running FP8 models on single device

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"method": "HOOKS",
"mode": "QUANTIZE",
"observer": "maxabs",
"scale_method": "maxabs_hw",
"blocklist": {"types": [], "names": [
"matmul_qk",
"matmul_av",
"lm_head"
]},
"dump_stats_path": "./hqt_output/measure"
}

0 comments on commit 058e91c

Please sign in to comment.